Schwartzian Transform

A PerlLanguage idiom for sorting arrays according to a criterion that is a function of the elements in the array. It cleverly avoids computing the criterion value every time an element in the array is passed as a comparand to the sort block. The code for the transform is typically insanely concise.

The basic steps are:

Named after its inventor, RandalSchwartz, a prominent figure in the PerlLanguage community.

Also known as map-sort-map and decorate-sort-undecorate (DSU).

Described here: http://www.stonehenge.com/merlyn/UnixReview/col06.html

An example from the above article: "Suppose I have a file that has people's names in the first column, and bowling scores in the second column:

        Fred 210
        Barney 195
        Betty 200
        Wilma 170
        Dino 30
and that I want to sort this file based on bowling scores."

The article presents this solution to sort lines from stdin according to the numerical values in the second space-delimited column:

  #!/usr/bin/perl
  print map { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map { [$_, (split)[1] ] }
    <>;


In PythonLanguage, it is customary to decorate the info so that the value to sort by comes first in a tuple and the second item in the tuple is the original info. The built-in sort method for Python lists does the sorting work. Then, to undecorate, just drop the first item from the tuple. Assume here the bowling info is read from a file f:

 bowling = [line.split() for line in f]
 decorated = [(int(pair[1]), pair) for pair in bowling]
 decorated.sort()
 undecorated = [pair[1] for pair in decorated]
 for pair in undecorated: print " ".join(pair)
It's easy in PythonLanguage to supply your own comparison function to the built-in sort method for sorting by special criteria.

 def compare(a, b):
return cmp(int(a.split()[1]), int(b.split()[1]))
 bowling = f.readlines()
 bowling.sort(compare)
 for line in bowling: print line[:-1]
But using the built-in sort method with a user-defined comparison function can be much slower than decorating each item to sort with the info to sort by, then using the built-in sort method as-is, then undecorating the items. Hence, the real purpose of DSU--at least in PythonLanguage--is to sort efficiently by special criteria.

In Python 2.4, DSU's importance is acknowledged by supporting it directly via the optional `key=' named argument to the `sort' method (and the new `sorted' built-in function). To built-in sort the above-exemplified `bowling' list by integer value of the scores, for example, you could code:

 bowling = [line.split() for line in f]
 bowling.sort(key=lambda name, score: int(score))
 for pair in bowling: print " ".join(pair)
This is maximally efficient (even faster than explicitly coding out the DSU as above). Almost as fast, if you want to sort "on the fly" without altering the `bowling' array, is the new `sorted' built-in:

 bowling = [line.split() for line in f]
 for pair in sorted(bowling, key=lambda name, score: int(score)):
   print " ".join(pair)


DSU is also the recommended trick in JavaScript for the same reason, if I remember correctly.


In RubyLanguage (as of version 1.8), the idiom is explicitly mapped via the method Enumerable#sort_by. For example, to sort numbers alphabetically:

  [1,2,3,10].sort_by {|n| n.to_s}        #=> [1, 10, 2, 3]
It is even possible due to the way Ruby does array comparisons to sort things based on multiple criteria. For example, to sort personnel records by last name, then first name:
  personnel_records.sort_by { |person| [person.last_name, person.first_name] }


EditText of this page (last edited February 15, 2008) or FindPage with title or text search