PatternForm Version:
Aliases: Scripting System, Interpreter, Extensible Middle Layer
Problem: The mapping between the primitives available and the driver that controls them is too static. Combinations of the primitives are limited, and no universal data type exists to transfer data between components of the system.
Context: The primitive elements of the system are used in a limited but effective way. The applications serve existing needs using well- defined business rules. New needs are emerging that do not have well-defined business rules, but must be integrated as quickly as the rules are defined. The success of integrating the new needs into the system depends in part on the changes to the work flow that supporting the new features produces.
Forces: A context for rapid experimentation and novel functionality requires new infrastructure. New data types must be created to accommodate the need for a universal transport mechanism between all existing elements in the system. The added complexity of introducing such infrastructure to the system could markedly reduce efficiency of the existing system functions.
Solution: Build an extensible interpreter environment that makes all the primitive elements of the existing system available via universal data types. Provide 'exec' and 'eval' functions that create localized sub-environments where both inputs and outputs are defined in terms of universal data types.
Resulting Context: A single generic driver system provides an entry point for a data type that relates the combinations and relationships of the primitive elements. The relationships could be pipelines, multiplexing, or combining data in order to create the desired results. All the previous functions are supported, but a much larger context has been created at the same time.
Design Rationale: If a system is designed such that its driver layer can be removed from a scripting layer, and the scripting layer can execute strings composed by the script (like back-quotes in the Unix shell), then it achieves fractal complexity. The scripting layer sits on top of atomic functionality like regular expressions, mathematical functions, variables, database access, file control, etc. An extended concrete example: Customize a web server to take URL's that mapped to HTML pages with embedded Tcl, then run as server-side includes. Cache the pre-processed page as a Tcl script, rebuilding it if the source file timestamp changed. Customize the Tcl interpreter to add database access and some native Tcl functions for convenience. Since Tcl has an exec functionality, code can be stored in the database and coupled with a driver script for any template page. This kind of structure uses only 3 layers to achieve 2n+1 (n=1->k) virtual layers. The key seems to be that if the interpreter operates unaware of where it is being called from, then a wonderful simplicity suddenly arises, since the level of reusability of the interpreter itself jumps a level. The language itself allows for localized contexts that are unaware (at least directly) of where they are invoked from. Writing your compiler in the language it compiles is a similar idea, kind of like writing the YACC specification in YACC syntax (but that is not for the faint at heart).
Examples: Sh, Ksh, Csh, Perl, ToolCommandLanguage (Tcl), Python, LuaLanguage, Java (not strongly), BeanShell (for Java), compilers written in the language they compile, JavaScript, networks of WWW servers.
Related Patterns: DecideOnThePrimitives, CreateFlexibleMessaging, (http://laputa.isdn.uiuc.edu/metamorphosis/metamorphosis.html -- BrianFoote), DataDrivenPrograms
-- DavidCymbala
Couldn't find the metamorphosis paper at the cited URL - I assume it is now here: http://www.laputan.org/metamorphosis/metamorphosis.html#Metamorphosis --PaulMorrison
Short version:
By virtue of the first rule of optimization, go ahead and write most of your code in the highest level language you can find.
By virtue of the third rule of optimization, when you must, use a profiler and find the slow parts of your program. Take those parts, and write them in a lower-level language.
(Almost every language out there lets you drop down to C or the like. Some languages even make it easy (See: Perl's Inline::C module: http://search.cpan.org/~ingy/Inline-0.44/C/C.pod and Python's Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/)).
So why not use something like CommonLisp, where you have all of this in one place?
Was that a joke?
Doesn't sound like one to me. Lisp has a reputation for being slow, but there are (both free and commercial) Common Lisp implementations that can provide speed near that of C, C++, or Fortran when the code is written for speed. Most Common Lisp implementations provide some sort of built-in profiling facility, as well as the ability to call C code.
I've been learning CommonLisp, and although it's possible to write code with speed comparable to C, it's more difficult. You have to fight the inherent dynamicity of the language, and the higher level abstractions are more difficult to mentally correlate with the assembly code it will compile to. Experts can do it, but it'll take longer to become an expert. All in all, I'd rather do C for low level code, though avoiding FFI ugliness and having use of real macros may make it worth using CommonLisp.
I'd have to disagree. It took me about 2 weeks to learn how to generate really efficient code in CL (cmucl or sbcl). There are, granted, some exceptions a problem areas that can be a bit of a pain (some related to the fact that those compilers are maintained by volunteers). But you say it is more difficult. Do you mean more difficult than writing non-optimized CL code? I'd agree. More difficult than writting fast C code? In that case, I'd say not much. I've written a lot more high-performance C than CL, also. Writing fast CL code to a large degree involves putting in things you have to do *all the time* in C anyway, so it isn't a loss there. High level abstractions can be a problem, I agree, but that has nothing to do with the language you are using. Well, possibly in the case that a more powerful language (in this sense) like CL makes it easy to do things you would hesitate to even try in C, I guess that is true. But that is not the languages fault --- if you want high performance code you have to understand something about performance, after all.
Supposedly (because I've only read about this, not used it myself), Lisp implementations support both fast development and fast execution by producing very fast code when you declare the types of variables. You can develop in the dynamically typed Python/Perl mode and then, when you identify bottlenecks, add type declarations to allow code optimization based on static typing a la C++. Dynamism and efficiency, without rewriting your code in a different language. Also (in my experience, not just supposedly), Lisp macros allow you to disguise obfuscatory efficiency hacks that would force you to write less readable code in other languages.
In general, writing simple code in most cases and then marking up critical sections to help the compiler generate efficient code is an old trick that in most languages is considered arcane and only done using nonstandard extensions. When I write Python and C++ the biggest visual difference is all the type specifications. Why have two different languages? Future languages will, I hope, be designed with multiple modes of programming and execution in mind, so that they support the smooth evolution of code from stab-in-the-dark hacking to dynamic scripting to statically optimized application or library code. I'm hoping to find that already implemented in Lisp. (The Lisp advocates keep pointing out how many "new" features have been available in Lisp for years, so I figure I should turn to Lisp for the functionality that I expect to reach the mainstream ten years from now.)
Common Lisp isn't good enough here not because it isn't sufficiently good hard language, but because it isn't sufficiently good soft language. This is simply because of the unavailability of the necessary breadth of maintained and widely used (and thus better bug-fixed) libraries available from the intarwebs, forcing a DIY approach which interferes with RAD. Just compare the set of better than alpha-grade libraries/frameworks significantly (or at all!) updated in the last two years for CL and Python/Ruby. You could say that this is not a language problem as such, but that's cold comfort in the real world where vaporware is worth zilch and actually existing software matters. It's also easy to get support for Python/Ruby from the net, whereas lispers tend to be more likely to skin you (or your newbie co-developer) alive than help you, which will again slow you down comparatively speaking.
As one authors, one can experiment in the soft language, and then refactor repeatedly used or time critical functions into the hard language. Decision to migrate influences future abstraction trade-offs. Functions in the hard layer abstract as dynamic dispatch with at least the Adapter DesignPattern. Functions in the soft layer abstract by replacing their source files.
The result over time is a system has grown a perfect ApplicationProgrammingInterface in its hard layer; one that satisfies all the needs of the soft layer above it as if by magic. This comprehensiveness can surprise initiates to the system.
"Granma what a big API you have!"
"All the better to EmbraceAndExtend you with, my dear!"
Writing your compiler in the language it compiles is a similar idea, kind of like writing the YACC specification in YACC syntax (but that is not for the faint at heart).
Sure it is. The only slightly hard part about writing a Yacc grammar for Yacc is that it is an LALR(2) language whereas Yacc only accepts LALR(1) grammars, but this is trivial to work around, in the same way that one always does Yacc workarounds: have the lexer do it. In this case, have the lexer do the extra one-token lookahead (it comes up on RULENAME ':'...the lexer has to lookahead and see the ':' in order to know to return RULENAME rather than NAME). Everything else about Yacc in Yacc is really easy. And I've done this, so I'm not just speculating. -- dm
A trendy version of that is the part of the XML Spec where they describe DTD syntax using much the same syntax they are describing, as DTD is a lot like BNF. --DanielEarwicker
The language of XML Schemas can be described in every detail by an XML Schema. The Schema committee worked hard to make sure this was so. They kept a working schema describing the language, and I think it is actually the "authoritative" version (if it interprets something differently from the english spec, the schema takes priority).
Drawback No 1. Hard languages tend to have StaticTypeSafety. Soft ones usually don't. So square pegs end up in round holes, and who's going to warn ya? Eh? -- DanielEarwicker
Static languages perform type checks at compile time; dynamic languages perform type checks at run time. (Static languages such as Haskell and O'Caml can be good soft languages.) It's reasonable to worry that when static and dynamic languages meet, there will be some confusion about who should check, resulting in uncaught errors. "Did you check?" "No, I thought you checked!" In practice, types are checked in the dynamic language at the interface between the two languages. For example, before calling a C++ function, Boost.Python checks the Python argument types against the C++ function signature and throws an exception if the types aren't compatible. For example, when calling the C++ function void foo::bar(std::string), Boost.Python allows foo.bar('Hi there') (and automatically converts 'Hi there' to a C++ string), but foo.bar(3) results in an exception being raised. The run-time checking on the soft side makes up for the lack of run-time checking on the hard side.
My favorite example of this pattern is modern 3D shooter games (!). Quake 3: Arena has a particularly beautiful architecture -- the game executable is merely a library of optimized C code that renders 3D objects, plays sounds, and synchronizes data among networked players. The game logic itself (physics, weapons, scoring, etc) runs atop the engine in a flexible virtual machine. Game code communicates with the engine through system calls or "traps" (there is also a small shared memory segment for high-bandwidth communication).
In Quake 3 the game code is written in a dialect of C, and then compiled down to bytecode (which the engine may JIT compile for extra speed). This means VM programs can be run on any CPU to which the engine has been ported. Also, the VM environment is sandboxed to prevent malicious or buggy game code from harming your system. (implementing the game logic as native CPU code in a DLL - the approach taken by Quake 2 and Halflife - affords no such protection, and is platform-specific).
The advantage of this split between engine and game code is that you can swap out the game logic without recompiling the engine. Also, the discipline of maintaining a clean, well-defined "system call" API is a great benefit to third parties who want to write new extensions for the game.
It is also possible to make the game engine API available through other languages. I've seen versions of Quake that support game code running in a standard Java VM and in a Python interpreter.
All Quake-, Unreal- and Halflife-based games have architectures like this. I doubt they were the first games to use these techniques though.
-- Dan Maas
Interesting. Are the shared mem writes bounds checked? Any idea where to get the source? --AnAspirant Yes, all memory access by the virtual machine code is bounds-checked for safety (even in JIT compiled code, there is still a check). The shared memory segment exists as part of the virtual machine's heap, so it is still safe.. The engine source code for Quake 1 and 2, and the game source code for Quake 1-3 are available at ftp.idsoftware.com. (Quake I has a similar virtual machine architecture, however Quake II uses a native DLL for the game code)
What about Z-code (ZorkCode)?
This seems like a really powerful pattern. I have lots of things to say!
Some more concrete examples would be helpful; i suppose the obvious one is unix shell script, which has almost no useful operations of its own and yet is really rather powerful.
The name of this pattern is a bit confusing; is 'alternate' a verb or an adjective? Either way, it seems to suggest a system with many layers, alternately soft and hard, but it seems to actually talk about a system with just two layers, a hard low-level one (the components) and a soft high-level one (the script). Perhaps a better name would be SoftOverHard??
The idea that big, evolving systems are best built using two layers, one hard and one soft, implies that such projects will need two languages, one hard and one soft (where a HardLanguage is what we usually call a ProgrammingLanguage and a SoftLanguage is what we often call a ScriptingLanguage). This suggests that attempting to invent languages to build big systems is folly; rather, we need to develop pairs of languages, one hard and one soft, that work well together - SymbioticLanguages, if you will. Can we observe this anywhere in the wild? Perhaps MicroSoft is doing it with CeeSharp/CeePlusPlus or VisualBasic/CeePlusPlus; the scriptable Office apps certainly looks a bit like AlternateHardAndSoftLayers. One could argue that web browsers, supporting JavaScript and JavaApplets, are a hard-soft environment, but to my knowledge, nobody has actually built a system in which modular applets are composed into an application on a single page using JavaScript, but it is certainly possible.
The idea of AlternateHardAndSoftLayers depends absolutely on the assumption that no one language can provide enough performance to be used in the hard layers and enough flexibility to be used in the soft layers; is this really true? SmallTalk and SelfLanguage are both very flexible and fast to develop, and seem ideal for the SoftLayer?, but modern VMs actually make them run quite fast (not as fast as C yet, but getting there); could they be used for both sets of layers, thus eliminating this distinction? I suppose one could argue that since the VM, and a set of primitive operations, is implemented in C that constitutes the HardLayer?, and it's just a lot smaller and less application-specific than discussed above, but i don't think i'd buy this.
-- TomAnderson
Lisp has this feature; extremely high level but with native code compilation..
This misses the point. The issue is not interpretation versus native code compilation, but rather strategies for leveraged optimization and trade-offs between clarity and efficiency. It applies to replacing CeeLanguage code with AssemblyLanguage code just as much as it does to replacing LispLanguage code with CeeLanguage code.
How about replacing Lisp code with tighter Lisp code?
Let us say that you have a large program written in (to use my own preference) SchemeLanguage. By using the high level language, with a suitably optimizing compiler, you are able to write it in a manner that is both efficient and elegant - but there's one a single large hot-spot which is slowing down the whole program, even after careful refactoring. This one function needs to run as fast as possible, and any speed improvement in it would affect the overall performance of the program.
At that point, wouldn't sacrificing 5 lines of Scheme code in exchange for, say, 50 lines of optimized Assembly code to get an speed-up of an order of magnitude, be a worthwhile tradeoff? If the rest of the program is well-written, it shouldn't even present a portability problem - after all, 50 lines is not very much even in assembly. Even given the ForeignFunctionInterface overhead, this would be a Good Thing.
ObjectiveCee has the soft and hard languages in one "language" -- most of the flexibility of SmallTalk in the OO extensions, and the 'hard' layer of C/C++... Apple's AppleScript Studio puts an even "softer" layer -- AppleScript -- on top of the gui classes provided by Cocoa implemented in ObjectiveCee. --KeithRay
ForthLanguage traditionally has both hard and soft layers in the same language. Normal words are compiled to threaded-interpreted code, but you had the option to use the assembler wordset to recode frequently used and bottleneck words as CODE words. Such words had full access to the machine's instruction set and could be referenced just like normally compiled words. Naturally, most of the core wordset of a Forth system (DUP, DROP, etc.) were implemented as CODE words. (Of course, Forth is pretty low-level itself, so this might be considered alternating Hard and Harder layers. Same with C with inline assembly.)
The book SoftwareDevelopmentOnaLeash ISBN 1893115917 seems to support the "AlternateHardAndSoftLayers" proposal.
From Amazon's Editorial Review: "He ties the building blocks together with structural and behavioral metadata, allowing simple, interpreted macros to drive everything from database access, screen layouts and many aspects of software development normally embedded directly into the software program. The rapid deployment effect this creates allows developers to perform simple surgical application changes, or rapid sweeping rework / enhancement - without changing compiled software."
Also see AdaptiveObjectModel, which SoftwareDevelopmentOnaLeash seems to be an instance of.
All Quake-, Unreal- and Halflife-based games have architectures like this. I doubt they were the first games to use these techniques though.
Many games written on the BBC micro 20 years ago used this pattern, with a combination of BBC BASIC for the game logic and assembler routines for the graphics.
I used this pattern for some numerical simulation work in chemical engineering in the early 1970's. This was on a Honeywell computer which had its main language as an interpreted BASIC. We had a FORTRAN compiler and could add compiled modules to the operating system which could then be called from BASIC. By modern standards we were very tight on memory, 16k if I remember correctly. The computer was a Honeywell 316. -- JohnFletcher
This pattern is used a lot by Python programmers, usually with the aid of tools like the SimplifiedWrapperAndInterfaceGenerator. JohnOusterhout wrote a paper in 1998 comparing development times for similar projects using system languages like C and scripting languages like TCL: http://home.pacbell.net/ouster/scripting.html
NatPryce also wrote a set of patterns for scripting languages: http://www.doc.ic.ac.uk/~np2/patterns/scripting/index.html
-- DaveKirby
My HtagLanguage is exactly such a SymbioticPair?. With DelphiLanguage of course being the symbiont. HtagLanguage provides the efficient HighLevel constructs while CustomCommands? written in DelphiLanguage provide the oomph and maximum flexibility.
-- SvenNeumann
How about this problem: I have a small section of code that must be in C (to read a from an external device) and the rest of the code needs to write to a MySQL database. I was originally wanting to do this using Python, but the MySQL libraries don't look as well developed as perl's. So, I'm trying to decide how I want to use perl to access these C functions. I've used the Inline::C module and it seems to work, but is there a better way to do this? I'm developing this under the linux platform so is there a way you could simply compile any C file and then access it's functions directly from perl without using the Inline::C module? Overall, I've found that most languages make it difficult to AlternateHardAndSoftLayers. --BlakeMason
RubyLanguage is very nice for writing C modules. PythonLanguage is less good, but still good. PerlLanguage is absolutely a mess, but not impossible to work with. Try perldoc perlxstut to start with. -- RobertChurch
Before the family of Inline modules existed that was certainly true, but e.g. Inline::C changes that. --JohanLindstrom?
See also: MultiLanguageRefactoring
One big drawback with this approach is how much development time you will spend in between the layers. The time to write and debug the transitions between the layers can quickly become significant. It becomes another design task to construct the API, and the goals are moving targets as your project progresses. -- AnAspirant
One of the recurrent problems with many large projects, that leads to their failure or to large delays (a large percentage of the total numbers of projects), is that interfaces between lower level layers and "middleware" or applicative layers are underspecified and, (worse) that by sloppy design or laziness of the programmers (which one cannot always control in large teams), the layers become ExtremelyInterstrangled, eventually leading to spaghetti designs.
In addition to adding flexibility when programming the applicative layers, where the vast majority of bugs reside and where 80 to 90% of the programming effort is spent, using two different languages forces to make the distinction between lower layers and applicative layers clear. There is no possible trade-off.
In my experience, in large projects (> 20 man-years) the time taken at the beginning for designing well-specified interfaces, when the project is easily manageable, is never wasted. It is slow at the beginning, but it ALWAYS pays towards the middle and the end of the project, when things get complicated. In the contrary, projects that make wrong design decisions at the beginning because they are in a hurry to show quick results to the client will carry the burden all along the course of development, and this burden will always get heavier along the road.
Most importantly, the larger and the more complex the project, the more acute these problems are.
Also: someone may well have done much of the work for you nowadays. For instance, the BoostPythonLibrary deals with C++/Python interoperation. You'll still have to do some of the work, but it might not be as hard as you think. That doesn't mean it'll be easy of course!
There is also the SimplifiedWrapperAndInterfaceGenerator which can interface many other languages to C and C++. -- JohnFletcher
As another example of this pattern, what about LPMuds?, there we have a hard layer, the driver, and a soft layer, the mudlib, where the driver and mudlib may call into each other, the mudlib may override driver-supplied functions with different ones, and functions may be moved from the mudlib into the driver for speed. -- MartinRudat
Much of this discussion has considered using hard/soft to resolve CPU bandwidth limitations; the other constraint is memory, and that one is tricker to resolve in a soft/hard split. Many of the preferred "soft" languages have bulky memory usage. Replacing data structures built there with the hard, lower-level ones can pose a nasty architecturing issue unless the original data structure was used in a few well-defined, uniform ways. This is probably the number one limitation that stops "maximally soft" programming style from being usable in all projects.
See ToolChain, XsltAsSoftLayer, LowLevelPartsWrittenInCee, SymbioticLanguages, LushLanguage