Embed References In Source Code

From SmalltalkLanguage

Why don't SmalltalkLanguage environments allow arbitrary embedding of objects in code instead of a limited set of built-in literals?

Can you give an example of what you want to do? I think the major SmalltalkLanguage environments already support "arbitrary embedding of objects in code", so I must misunderstand what you're looking for.

Well, arbitrary literals is definitely not what I had in mind. What I want isn't to have more literal types, more types of objects that you can build at compile time using special syntax. What I want is to get rid of all literals by being able to stick run-time objects at code time.

But I can't blame you for misunderstanding since embedding arbitrary objects only starts making sense after you're able to build arbitrary objects in a graphical and interactive manner. One of the Self papers describes just such a UI. But let's fake it instead.

Embedding an arbitrary object in code means that you can run ((OrderedCollection new add: 1; add: 2; add: 3; yourself) grabIt) (which grabs the resulting object into the clipboard) then paste that object directly into your code. When that code runs, it will not call a new object that's a copy of the old one, but the very same object that you pasted into your code.

The nearest to an object builder in Smalltalk is using inspectors to modify an object stored in a global. Embedding arbitrary objects means that you could then grab that object from the global, stick it in your code, and remove the global from the system.

What's the advantage of this? The sheer elegance and beauty of it. The uniformity, you can embed anything, not just things with a textual literal representation. And best of all, you get rid of testing for nil in the SingletonPattern since you only initialize the singleton when you create your code, instead of at run time.

Elegance always pays off with free features.

When I use IbmSmalltalk and the EnvyDeveloper repository, I free move objects and collections of objects to and from the repository and into my image. When I move them in, they are simply appear -- as binaries -- in my image. No compilation, no source code, they just come in. Their references are generally implied rather then explicit, at least at the level that they are used by a developer day in and day out. Can you sketch the differences between this and what you suggest?

The difference is that the topic of this page has nothing to do with filing in or out. It has nothing to do with building an object outside of the Smalltalk environment. You grab a reference in a browser and you stick it in your code in a browser. What this page is about is early, incremental compilation of code. Far earlier and more incremental than we're used to in Smalltalk. From whole methods down to individual objects. For example, you would be able to write Object, evaluate it immediately, then leave the reference to Object embedded in your source code for any future compiles and recompiles.

When I referred to grabbing a reference and sticking it elsewhere, I did not mean to imply that the reference would travel through the MS Windows clipboard. It could be stored by the browser, which is an entirely trivial operation for a browser written in Smalltalk itself.

And what it has to do with the singleton pattern is the following. Singleton typically requires a method that initializes the singleton only after testing for nil. With embedded objects, you would never need such a method, because the singleton would be initialized when you write its methods (at code time, not compile or run time) without needing to rely on any global variables.

EnvyDeveloper doesn't do "filing in or out" either. How's this: build an instance of CompiledMethod? once - either from source code or operations done on a possibly persistent parse tree. From that point onward, install that method into the class whenever you want to use it. Many operations are direct mutations of the CompiledMethod? itself or the parsetrees that generate it. The CompiledMethod? that is Singleton's accessor mutates itself to initialize or not initialize itself, based on things like the image state. I agree this stuff is cool. It's been there for years. You are describing the behavior of EnvyDeveloper, when it comes to methods, classes, pools, applications, maps, and so on, and perhaps GemStone and its ilk for the more general case. Are you familiar with the persistence approach of ObjectDesign, which used the virtual memory faulting mechanism of the hardware to page object images in and out of a large-scale object repository?

No, I am at all familiar with object repositories.

As for what you're proposing, I understand it to mean one of two things. First, you could be describing self-modifying code, which is bad in the general case. Second, you could be describing something like the parcel mechanism's preProcessing and postProcessing methods, except done on a per-method basis. This is uninteresting and trivial. I see nothing cool whatsoever involved.

Object repositories simply aren't a fundamental language construct. In contrast, code-time evaluation / binding deals with the most fundamental of all language constructs. That's why code-time binding is important on at least 3 different levels (uniformity of "literals", uniformity of evaluation and directness of code manipulation). Why your idea is important still eludes me. And what it has to do with embedding references in source code, how it got on this page in the first place, I simply have no idea.

No, not self-modifying code. No, nothing to do with "parcel mechanism's preProcessing and postProcessing methods". Smalltalk executes instances of CompiledMethod?, bound into instances of BlockContext?, which chain together in an instance of Process. This runtime metastructure has always been available for developers to use as they see fit. The object repository is an example of one way this runtime metastructure has been used. The "embedded references in source code" that you describe strike me as another closely related use. In the conventional approach, an instance of Compiler builds a parse tree (from source code) that is generally kept temporarily in an instance of Parser. The parser instance is then traversed to create the CompiledMethod?. I believe you are suggesting that the instance of parse tree could be kept persistently (instead of the corresponding source code), or could be created without source code. The resulting instance of CompiledMethod? could be completely interchangeable with the rest of the system. Meanwhile, there is a set of operations that you have suggested that can be accomplished using manipulations of the literal frame of a CompiledMethod?. An analogous set of opportunities exist that result from manipulating the instances of BlockContext? that form a running Process. I bring up EnvyDeveloper repeatedly because it is a product that uses these mechanisms and reflects the accumulated wisdom of a decade of widespread commercial use. Feel free to ignore it if you prefer.

What I'm proposing has nothing to do with either CompiledMethod? or BlockContext?. It has everything to do with the Browser. I don't know how the browser keeps source text, so let's call it SourceText?. My proposal is to alter SourceText? so that you can store references to precompiled objects and their locations in the source code in it, to alter the Browser so that you can store, inspect, retrieve and manipulate these precompiled objects appropriately when looking through the source code, and to alter the Compiler so it uses these precompiled objects appropriately when it runs across them.

I do know what the browser does with SourceText? -- it does not "keep" it. The browser is a connected group of several list panes and text pane (the same text editor used throughout the system). The browser collects the contents of its text pane and passes it to the Compiler, which builds the parse tree and so on I described above. SourceText?, as you call it, is a simply a string - any string will do, from any source. You are quite correct in contemplating an object, perhaps an extension to the browser, that can "store, inspect, retrieve and manipulate these precompiled objects". The objects that this extension is working with are almost surely instances of Parser, CompiledMethod?, BlockContext?, and Process. I think this exchange is at the point where real code has to be written before either of us can add much more value.

After rereading, yes, the extension of the browser could be dealing with the parse tree which Parser holds, which would be persistent. Of course, the parser would have to be modified to store comments instead of stripping them, and an unparse method would have to be created to reconstitute source as needed. It doesn't seem strictly necessary to do things this way for the proposal on this page, but it would be necessary for my other proposals (eliminating all syntax from the language, creating customizable syntax for representation). The Parser would end up slightly modified for this proposal and heavily modified for my other proposals. The browser would be extensively modified. There would be no modifications to any of CompiledMethod?, BlockContext?, and Process.

Feel free to try it that way, and see what you get. At least we seem to be communicating now. I use different words to describe what we seem to be discussing, but I think the effect is the same. Where you say "eliminating all syntax from the language", I say "rely on objects rather than any language". In my view, the objects that represent, for example, the parse tree are the only thing that matters. A "language" is, in my view, an import/export mechanism that allows those objects to be created in other languages. Since, as you've observed, I can decompile the parse tree into source (leaving aside problems with variable names and such), I think it's possible to emit any language we like -- Java, Smalltalk, Perl -- from the same parse tree. Instead of laboriously archiving enormous text files, I think we should be versioning object databases that contain the persistent parse tree instead. Are we now at least in the same library, if not the same book or page?

You can say that again.

For instance, you cannot emit any language you like from the same parse tree. This notion is blatantly false.

A parse tree will always be related to a language. It will be either a Smalltalk parse tree, or a Java parse tree, and so on and so forth. Worse than that, a VisualWorks parse tree will generally not be the same as a Squeak parse tree. It isn't the tree structure that's significant, the tree structure is irrelevant, it's the contents of its nodes that matter.

Finally, you get only minute space savings by storing source as parse trees rather than text files. All of the text still has to be there somewhere. The savings due to common method names mapping to the same symbol are going to be miniscule.

The advantages of using persistent parse trees don't lie in some miraculous increased ease of translation nor in any magical storage savings. They lie in vastly increased flexibility and the separation of syntax from the language, making feasible the use of multiple customized syntax.

Enough ranting. If someone wants to code this up, I encourage them to do so. As worthwhile as this feature would be, I simply do not have the time to devote to its clean implementation.

If I embed, say, a Color object as a literal (which I think would be really cool - imagine seeing a little splotch of green instead of the words "Color green"), and then later change the internal representation of Color objects (maybe to use HSV numbers instead of RGB), should the embedded literal be automatically updated? And if so, how? -- AdamSpitz

Ummm, that's completely off base. If you embed a Color object into code, the most uniform way to display it would be to print out aColor in the code, and highlight it as an embedded object, just like the browser might highlight something as a variable or argument.

There are two good reasons why a color object should never be portrayed as "a little splotch of green".

First is that colored code is syntax. And since SyntaxFollowsSemantics, you shouldn't waste precious syntax on utterly trivial semantic differences like different color objects. For example, the RBCodeHighlighter already uses colors to highlight different identifier types, this is good.

As long as I get my simple denotation of embedded objects, I'm happy.

-- RichardKulisz

The "splotch of green" thing is just a UI issue - the ability to embed things is mostly orthogonal to the issue of how to represent an embedded thing on the screen. We can argue about that later. :)

The question was: how early-bound is an embedded object? By embedding a reference to an actual object, rather than an expression that evaluates to an object, you're shifting the binding to be much earlier. ("Compile time", rather than runtime.)

Technically, that doesn't cause any new problems - the Self guys already dealt with the early-binding issues when they built the Transporter system. And I don't think I'd mind having this embedding thing be possible (like you say, it ought to be, just for uniformity). But usually I want something closer to the late end of the early/late spectrum. Maybe I should be able to build an object (graphically or whatever) and then convert it to a more-late-bound expression that evaluates to it.

-- AdamSpitz

An embedded object isn't bound at compile time but at code time. It's bound to the source code when you embed it. It doesn't need to be bound at compile time, because it already is. Alternatively, you can see code time binding as incremental compilation of individual objects instead of whole methods. IOW, as just automatic, interactive evaluation of code as soon as you write it, before you accept the method in the browser.

Code time binding to complex objects (eg. arrays) can be problematic. Is #(1 2 3) the same array as #(1 2 3)? That's why you should use copy to make sure. Which just goes to show why code-time binding was developed in a prototype-based language like Self. However, code-time binding doesn't require prototypes. You can bind the identifier Object before you actually compile the method, using #become: whenever you alter a class to get at the reference in the source code, as per usual.

If you want to reverse build your construction then you could use #storeOn:. Or the browser could keep track of your steps (starting from the last #newBuild send) as source code and splice them, as source code, wherever you want them. Now this issue isn't orthogonal to compile time binding, it's the exact reverse of it. Deferred evaluation instead of early evaluation.

What is the Transporter system? -- rk

Agreed that "code time" is a better word for it than "compile time."

"Transporter" is the name of Self's file-out system. (The name comes from the idea that you're transporting the objects from one image to another.) File-out is a much harder problem for Self than for Smalltalk, precisely because of this code-time binding stuff.

For example (and you probably understand this already, but I'll include it for the lurkers :), if you're working on the Wobulator module, and one of the objects in that module has a code-time reference to, say, the "traits collection" object, and then you file out the Wobulator module, what should happen? You probably don't want the "traits collection" object to get filed out as part of Wobulator, and get a new copy of the "traits collection" object when you file Wobulator back in. Rather, you want it to know that that slot should hold a reference to the object whose creator path is "traits collection". But if you file out the Collection module, you want the "collection" slot in the "traits" object to know that it's the creator slot for the "traits collection" object, and so it should file out that object and recreate it on file-in.

Anyway, the Self guys ultimately concluded that, for fundamental reasons, there just isn't enough information in the web of live objects to be able to file objects out and bring them back in properly. That's why you have to do things like mark slots as being "creator slots" in order to use the Transporter.

This embedding stuff is exactly the same problem, technically, so something like Self's Transporter ought to handle it just fine. But it's sure more complicated to think about than plain ordinary late-bound expressions.

I really like the idea of building something directly and then reverse-engineering it. #storeOn: wouldn't be quite good enough (because sometimes I want "Color brown" instead of "Color r: 0.6 g: 0.2 b: 0.0"), and neither would having the system keep track of my steps (because sometimes I do want "Color r: 0.6 g: 0.2 b: 0.0" instead of "Color brown"), but a combination of the two might work well. Or something. :) This is a cool idea. I need to think about this more.

-- AdamSpitz

Part of

EmbeddedReferences