Cee Plus Plus Templates Common Lisp Macros Comparison

Which is sort of like a boxing match between Oscar de la Hoya and Pee-wee Herman. --Not a SmugLispWeenie, but knows a mismatch when he sees one. :)

[From TemplateMetaprogramming - Can anybody think of a less verbose title for this page?]

Those who find the TemplateMetaprogramming stuff discussed in the GenerativeProgrammingBook interesting should definitely investigate CommonLisp, whose macros are immensely more powerful than CeePlusPlus templates: the whole power of the language is available to you for decomposing, analysing and constructing code at compile time (more precisely, at macro expansion time). See LispMacro.

Here's an example of something very simple that you can't (so far as I can see) do with C++ templates, but that's trivial with CommonLisp macros. (As a matter of fact, you probably wouldn't do it with macros in Common Lisp anyway; you'd probably use an inline function, which would work just as well.)

 (defmacro average (&rest args) `(/ (+ ,@args) ,(length args)))

This says: If you see (average a b c d), turn it into (/ (+ a b c d) 4). Note that the form (length args) is evaluated at macro-expansion time.

The use of templates specifically for optimization is paralleled by CommonLisp's "compiler macros".

While I have no argument that the above mentioned CL features are powerful, you are unfair to C++ templates. While often ugly, they can do some very surprising things (it turns out that C++ templates are TuringEquivalent). I expect that the above task can be met by three templates, but I am not going to attempt it off the top of my head.

They are indeed TuringEquivalent, but that doesn't mean they can do everything. (They can do everything up to isomorphism, but the isomorphism may involve rewriting the entire remainder of your program or making it all run exponentially slower or something.) If you reckon it's possible to do something equivalent to my average example with C++ templates, I'd be very interested to know how. A quick sketch of the idea would suffice.

See CompileTimeGenericAverageFunctionInCeePlusPlus for example implementations of variable sophistication.

As much nicer as the CL solution is, to give credit to C++, it isn’t nearly as difficult as it is ugly:

constexpr int sum () {
return 0;
}

template <typename T,
  typename... Rest>
constexpr T sum (T t, Rest... rest) {
return t + sum (rest...);
}

template <typename... Args>
constexpr auto avg (Args... args) -> decltype (sum (args...) / size_t (1)) {
return sum (args...) / sizeof... (args);
}

"While often ugly, they can do some very surprising things (it turns out that C++ templates are TuringEquivalent)."

This points to the really important difference between TMP and Lisp macros: Lisp macros enable you to make anything look as beautiful as possible, whereas TMP tends to look ugly. -- PascalCostanza

You may want to pop by http://oonumerics.org/blitz/papers/ BrokenLink -- it contains several interesting papers, especially "CeePlusPlusTemplatesAsPartialEvaluation?", which contains an example of a generic average function that gets a majority of the information it needs at compile time.

I have read "C++ templates as partial evaluation"; in fact, I have a copy on the desk here in front of me. It was precisely C++'s inability to do variable numbers of arguments properly that I had in mind, and I'm relieved to find that I wasn't mistaken about it. :-)

This is a fine example of what I said about "TuringEquivalent" not being the same as "able to do everything". Every programming task (if you ignore trivia like textual input and output, graphics, networking, executing in a reasonable amount of time, etc) is equivalent to something you can do with C++ templates. That doesn't say anything much about how useful C++ templates are for metaprogramming. It turns out that they're pretty useful, but they still aren't a patch on Common Lisp macros. Or even Scheme or Dylan macros.

(At some point we ought to do something about the now slightly confusing ThreadMode here...)

Agree about the above comment on ThreadMode. However, for the moment I am just going to make it worse. My CL is very rusty, so perhaps you can comment on a 'reversal' of the above situation. That is, a common (at least in numerics) problem where C++ templates do a nice job... and what would the equivalent CL look like:

What I am thinking is something like this (support code missing, obviously). This should (from memory) be something like BlitzPlusPlus syntax.

Array<double,3> array(10000,10000,10000);
...
Array<double,2> foo(5000,5000);
Array<double,2> bar(5000,5000);
...
Array<double 2> subarray(array(Range(0,5000),Range(0,5000),500);
...
Array<double,2> result(5000,5000);
result = subarray*foo + alpha*bar +subarray*bar + beta*foo;

Ok, so the main points are that subarray must *not* copy any values. It is a view into a particular slice of array. Also (and here the ExpressionTemplates are used), the calculation of result must not create any temporaries.

Can this be done as cleanly (with the above efficiency constraints) in CL. I realise the CL won't (probably) execute as quickly, but it should have the same sort of raw mem footprint and execution time.

Does it really do matrix multiplication and use no temporaries? I don't think I believe that's possible without (in some circumstances, at least) a heavy time penalty. This is a matter of algorithms, not of languages. Anyway:

Yes. Well, actually the above would have a couple of temporary doubles but no temporary objects --- I should have been more specific. Perhaps you are thinking of the case foo*bar*baz ? If you don't use a temporary here you lose badly on complexity. On the other hand, I sometimes have code where a temporary matrix is never an option (will exceed physical memory) ...

The main loop would expand to something like (p-code ish):

for i, j
  result(i,j) = subarray(i,Range::all)*foo(Range::all,j) + alpha*bar(i,j) + subarray(i,Range::all)*bar(Range::all,j) + beta*foo(i,j)

and the subarray(i,Range::all)*foo(Range::all,j) would then expand to vector products summed into temporary variables.

come to think of it it is possible to avoid the temporary doubles also, by splitting into three statements...

I heard it claimed just recently that actually BlitzPlusPlus doesn't do matrix multiplication -- only elementwise. That would be pretty nasty if true. Can you confirm, one way or the other?

Hmmmm. I haven't used blitz for quite a while. I have written some similar code. However, it is important to note that blitz++ is *not* a linalg library, and is based on an array class, not a matrix class. In other words, you could build a BLAS-like library out of blitz++, but not the other way around. So I expect that the basic Array<T,N> type in the library does not supply matrix multiplication, as that wouldn't make sense. On the other hand, the last time I looked at it there was a Matrix class in pre-beta, which may be usable now.

Depends on what you mean by "as cleanly", I suppose. The main issue is that you can't overload the standard operators for assignment, arithmetic etc. But you can define your own for use with matrices and the like, which is almost as good. Thus:

Sure, I didn't mean the operator overloading, just that the design should closely mimic what we write on paper.

Define a class for sliceable arrays: a sliceable array consists of a vector, the index of the first element to use, and a bunch of "strides". (I think this is Blitz++'s internal representation too.) Define a SUBARRAY function that just diddles with the strides etc.

You need a vector of base elements too, as arrays have any indexing you want, e.g. -10..10. But similar idea. In the case of blitz++ and other array classes, the vector will be reference counted and thus the mechanism to avoid deep copies.

At this point, we have two options. One is slightly more efficient; the other integrates better with the rest of the language.

First option: define a new macro for doing array assignment. It should do some sort of tree-walk on the RHS, much as Blitz++'s ExpressionTemplates effectively do. Not terribly hard, I think. Now, you do all your array operations using this. As it stands, this won't evaluate array expressions efficiently anywhere other than on RHSes of magic assignments, so define another macro that means "make a new array object, and assign stuff to it". Note that the RHS of our magic assignment thing can use the usual names for arithmetic ops, because it's going to be translated into something else anyway.

Second option: define a new class called DELAYED-ARRAY or something. A DELAYED-ARRAY is a promise to deliver an array on demand, together with (some representation of) the array expression to evaluate. Define arithmetic (etc) ops on arrays that yield DELAYED-ARRAYs. Define arithmetic (etc) ops on DELAYED-ARRAYs (which can have the same names as the ones for ordinary arrays, even though neither can be called +,-,*,/ etc) that also yield DELAYED-ARRAYs. Actually asking for an element of a DELAYED-ARRAY "forces" it and makes it do the calculation. The first thing it does is to work out what calculation it really needs to do. Then it does it, and this latter bit can be very efficient.

I've been waving my hands here. Here's how your example above might look, with the first scheme.

  (let ((array    (sliceable-array 10000 10000 10000))
        (foo      (sliceable-array 5000 5000))
        (bar      (sliceable-array 5000 5000))
        (result   (sliceable-array 5000 5000))
        (subarray (slice array (range 0 5000) (range 0 5000) 500)))
    ...
    ;; To assign to a SLICEABLE-ARRAY, use ASETQ.
    (asetq result (+ (* subarray foo)
                     (* alpha bar)
                     (* subarray bar)
                     (* beta foo)))
    ;; (A stuff) === (let ((temp (sliceable-array))) (asetq temp stuff) temp).
    (do-something (a (* foo bar))
                  (a (+ result foo))
                  result)
    ...)

With the second scheme:

  (let ((array    (sliceable-array 10000 10000 10000))
        (foo      (sliceable-array 5000 5000))
        (bar      (sliceable-array 5000 5000))
        (result   (sliceable-array 5000 5000))
        (subarray (slice array (range 0 5000) (range 0 5000) 500)))
    ...
    ;; The following doesn't actually do any array calculations at all!
    (setq result (add (mul subarray foo)
                      (mul alpha bar)
                      (mul subarray bar)
                      (mul beta foo)))
    ;; Nor do the args of this:
    (do-something (mul foo bar)
                  (add result foo)
                  result)
    ;; But doing this will make RESULT evaluate itself.
    (ref result 123 234)
    ...)

Unexpected upside of the second approach: If you do a lot more array calculation than actual element use, there may be opportunities for more optimization to occur than you'd expect. Unexpected downside: If the values in the arrays change before the evaluation is done, you lose. But then, with shared data as in Blitz++ you always have to worry about that sort of aliasing problem.

Right. I have used the same sort of approach in C++ (comment below).

One possibly-important difference: C++ is (mostly) StaticallyTyped, CL (mostly) DynamicallyTyped. This means that some things you can do at compile time in C++ simply can't be done at compile time in CL. Well, maybe with clever MetaObjectProtocol diddling you could have parametrised classes and do clever things with those. That makes my brain hurt.

Yes, this is primarily the difference I was thinking about, and interested in the implications. It works the other direction as well, there is a project to (partially) implement fp in c++ templates, but you can't get LambdaExpressions right because of the typing... IIRC.

www.boost.org has lambda expression library for C++, i have not used it tho

: OK, here's how it looks to me.
: The main issue is with the first of my two approaches. You want to be able to distinguish between scalars and vectors and arrays at compile time so that you can generate good code. C++ templates can do this because the types of all the constituents are known at compile time and form part of the pattern matched by the templates. CommonLisp can't do it quite like that.
: However, the better CommonLisp compilers do have rather good TypeInference. So what you can do is to define an inline function that checks what kind of object it's dealing with and then behaves accordingly. If your compiler is good enough, all the unnecessary code will be proved unnecessary by means of type inference and then optimized away.
: In the interests of honesty, I'd better admit that I don't know whether this would actually work well. Making it do a good job could be tricky.

Does all that make sense?

I think so. Your second scheme is basically LazyEvaluation, no? There are problems with this in C++ in that operators now need to take in arbitrary delayed expressions, and encoding the operations can be tricky. The typing in C++ makes things worse, and IIRC it can not be done in a general enough way. I expect CL has an advantage here.

The first scheme requires temporary matrices, does it not? That would bury this option for my needs :(

Yes, the first scheme is lazy evaluation. No, the first scheme doesn't require temporary matrices. It could do substantially the same as BlitzPlusPlus ExpressionTemplates do. In fact, it could be more sophisticated -- e.g., it might be able to optimize away some of the iterators if it can see that they refer to the same vector. But then, the compiler will effectively do that later anyway when it does CSE, so maybe that's not worth worrying about.

But I think, on reflection, that I like the second option better. For typical applications, you probably want to make it generate and compile code at runtime, and use memoizing so that in typical uses -- including all uses the C++ approach can cope with -- it only actually has to do that once.

I see that the simplicity of the CL approach above is due to fairly powerful built in array type. However, a little reading and my limited knowledge of CL leave me with this question: Is it possible to slice out dimensions in CL arrays (This is trivial to implement in the sort of C++ template approach given above, btw). I can see some stuff about 'displacements' in various places on the web, but I can't see how to solve this problem with them. To be specific, I want to take a 3d array, lets say, and slice out a 2d subarray from it. Of course the result has to be a proper array, and shouldn't copy data. Is this possible with the standard CL array type?

Depends on whether you want a rectangular sub-slice or not. Common Lisp's arrays are defined to be stored in row-major order, and any displacement (slicing) will be done sequentially starting at whatever index you specify. If you want to do rectangular slices (for example, a rectangle in the "middle" of a bitmap), known as conformally displaced arrays, you will have to write your own type and accessors (easy to do per-application, but unfeasible for the whole system, which is annoying considering that Lisp Machine Lisp had conformally displaced arrays (and bit-field sized displacement, and lots of other neat stuff left out of ANSI CL)). The rationale for this is to make simple array access as simple and fast as possible (otherwise there would have to be a branch in every array accessor to check for conformal displacement and use their index-calculation routine). Conformal displacement index calculation is pretty expensive as well - it requires several comparisons for each dimension in the worst case.

CategoryCpp CategoryCppTemplates CategoryCommonLisp CategoryLisp