Pattern Matching In Java

Context PatternMatching is a tried and true technique in FunctionalProgramming. Unfortunately it is not available out of the box in Java. It is even more important when information to be processed are trees (XML) with a complex and variable shape. In absence of an elegant solution using a tree oriented API (DOM, SAX) for picking up the nodes is an exercise in frustration and grunt work

Solution The solution is to define a tree pattern where special nodes in the tree (for example denoted by a starting ?) are placeholder that will match any klind of actual information in the data tree and bind that sub-tree to a variable. A UnificationAlgorithm? will then match the data against the pattern, binding variables and calling a method that will have all the parts of the tree that are of interest conveniently bound to its variables.

'''Example. In current WikiChangeProposal project, the content of wiki pages is internally stored as S-Expressions. In order to transform it into the final HTML the following S-Expressions is used for pattern matching:

( ((h ?level . ?rest) . onHeader) 
  ((p . ?rest) . onP) 
  ((ul . ?rest) . onUL) 
  ((li . ?rest) . onLI) 
  ((table . ?rest) . onTable) 
  ((tr . ?rest) . onTR)
          ((td . ?rest) . onTD) 
  (hr . onHR) 
  ((a (href ?hrefValue)) . onExternalLink1) 
  ((a (href ?hrefValue) ?label ) . onExternalLink2)
  ((a (?internalLink ?team ?version) ?label ) . onInternalLink2)
  ((a (?internalLink ?team) ?label) . onInternalLink2)
          ((a ?internalLink ?label) . onInternalLink)
  ((chess . ?list) . onChess)
  ((pre  ?any) . onPre )
  ((viewpoints . ?list ) . onViewpoints ) 
  (?any . onDefault)
)

The CDR of all the elements in the list (represent the names of Java methods to be called. Those methods will be called with the parts of the tree referenced by ?xxx placeholder bound to method parameters. For example:

 public void onPre(SExpr s) throws IOException {
os.write("<pre>\n".getBytes("ASCII"));
htmlOs.write(s.asString().getBytes("ASCII"));
os.write("\n</pre>".getBytes("ASCII"));
}

Why not use an embedded Lisp like Jatha http://sourceforge.net/projects/jatha/ ? Or embedded prolog like JLog http://jlogic.sourceforge.net/ which is based on the UnificationAlgorithm?

That would have been one very sound possibility, and I'm not sure if it wouldn't have been better than what I ended up with. The reason why I didn't embed a Lisp or , as I have more experience and trust, with embedded Scheme , is that I started with WcpEssExpression? from another exploratory project so I added a few syntactic sugar, and once you have the s-expressions in place, writing the pattern matching is rather trivial. In my wildest dream, the WikiChangeProposal project may end up with an embedded Scheme interpeter and with user-editable programs as pages, but given that it currently is a very minimalistic project built from very scarce spare time, I was reluctant to do that until I can manage the security implications.

I think there's value in applying this DesignPattern to projects less suited for Scheme interpreting such as in the context of distributed messaging or simply in projects where people don't know Scheme/Lisp. Also, the idea should be migrated to XML (maybe they do it already with XML data binding API? - haver to check it out). In any case I've seen some pretty ridiciulous code walking the XML trees by hand and imbricating lots of ifs and lots of boolean state to figure out one case or the other, and I've seen the opposite that the shape of the XML is made annoyingly rigid in order to keep the processing very simple.

Next step is to allow for "order - independent" matching of nodes, optional nodes and default values. I have yet to figure out how I can use the * and | regular expression operators in this context -- dispatching to a piece of code with bound variables. OnLisp has a useful discussion and I also found ftp://ftp.research.microsoft.com/pub/tr/tr-98-55.pdf


This DesignPattern has some similarities and overlaps with XSL transformations.


EditText of this page (last edited October 22, 2005) or FindPage with title or text search