Lex And Yacc

Lex is a lexer (lexical analyzer) generator, Yacc is an LALR(1) parser generator. LexAndYacc is a book by O'Reilly:

http://www.oreilly.com/catalog/lex/index.html

� � � � � � � � � �

Before those two pigeons learned to use Lex & Yacc, the tops of their heads were very smooth... Or maybe it was from tussling with the dragon...

Overviews and online manual pages for lex, yacc, flex, and bison can be found at "The Lex & Yacc Page": http://dinosaur.compilertools.net/


Looks like a cool site, where did you run across that, Elizabeth?

I searched dmoz OpenDirectoryProject for 'LALR' and soon ended up with a listing of Lexer and Parser Generators: http://dmoz.org/Computers/Programming/Compilers/Lexer_and_Parser_Generators/

Even as I speak I am working on parser generator implementation; looking for a yacc alternative, were you?

Nope, not in the slightest. I don't know a darn thing about parsing. (Sign me up for "Parsing for Dummies.") I was merely trying to find out what 'LALR' stands for! The UK FreeOnLineDictionaryOfComputing site (and WikiPedia) turned out to be informative.


I have finally learned to grok lex, and have written a tool or two. But every time I start in on the Unix manual page for yacc(1) I get stymied when I get to LALR(1) because it is not in chapter one of the manual. Snippet:

 # garrod@noc 145 $ cd ~ && man LALR
 man: no entry for LALR in the manual.

# garrod@noc 146 $ cd ~ && man -k LALR yacc (1) - an LALR(1) parser generator
I still don't grok yacc(1) -- can someone explain LALR(1) please?

[The notation "LALR(1)" is not a reference to a Unix manual section, despite the resemblence to the notation used in "yacc(1)". It's a mathematical notation. LALR(1) means "LookAhead, Left-to-right, Rightmost-derivation, 1-token (of lookahead)" ]

It's not a small thing that you need to look up in a glossary or a man page, it's a large thing from the mathematical theory of grammars. Typically people take a compiler course to learn about LALR and, as a side benefit, parser generators like Yacc. Or one could self-study from the "bible", nicknamed TheDragonBook because of the dragon on the cover: "Compilers -- Principles, Techniques, and Tools" by Aho, Sethi, and Ullman, second edition.

But briefly, yacc accepts the definition of a context-free grammar written in BNF (Backus-Naur Form) and generates a bottom-up table-driven parser that implements that grammar. Context free grammars are one step in formal power above regular expressions: they allow recursive rules, while regular expressions do not.

With help from a lexical analyser such as generated by lex, most programming language grammars can be considered to be context free, and in fact, most (but not all) can be implemented by the subset of context free grammars called LALR(1) ("Look Ahead 1 token Left-to-right Rightmost derivation"), and can in fact be implemented quite efficiently, time-wise. Yacc supports such LALR(1) grammars.

There are online tutorial introductions to Yacc; one can write simple Yacc grammars without intensive study. But it may not be possible to actually debug them and get them working without studying the underlying theory. -- DougMerritt

For what it's worth, WikiPedia has a fairly detailed article about "LR Parsers": http://en.wikipedia.org/wiki/LR_parser Its article about LALR is rather minimal, though. -- ElizabethWiethoff (But don't listen to me. Keep in mind I need "Parsing for Dummies"!)

Most good books on compilers I've read have at least a couple chapters devoted to the theory of parsing and state machines in general, and lex/yacc in particular.


See RegularExpression


CategoryBook CategoryCompilers


EditText of this page (last edited December 11, 2006) or FindPage with title or text search