Recursive Descent Parser

A recursive descent parser is a top-down parser which basically has a function for every nonterminal. The function implements a grammar rule by calling other functions to read the right-hand side. So basically if you have a grammar rule that looks like this:

  statement <- begin statements end | if expr then statement

it turns into a function that looks like this (pseudo-Java):

  Statement readStatement(InputSource s)
  { if (s.peekTokenType() == TokenType.BEGIN)
    { s.read(); // read BEGIN token and discard it
      Statement[] statementList = readStatements(s);
      if (s.peekTokenType() != TokenType.END) throw new SyntaxError("END expected");
      s.read(); // read END token and discard it
      return new BeginStatement(statementList); // BeginStatement extends or implements Statement
    }
    else if (s.peekTokenType() == TokenType.IF)
    { s.read(); // read IF token and discard it
      Expression expr = readExpression(s);
      if (s.peekTokenType() != TokenType.THEN) throw new SyntaxError("THEN expected");
      s.read(); // read THEN token and discard it
      Statement statement = readStatement(s);
      return new IfStatement(expr, statement); // IfStatement extends or implements Statement
    }
    else throw new SyntaxError("BEGIN or IF expected");
  }

Recursive descent parsers are easy to implement, but they are limited in the languages they can handle. For example, parsing ordinary arithmetic expressions requires more work from a recursive descent parser; CompilersPrinciplesTechniquesAndTools describes a way to refactor the grammar to accommodate them, but there are other ways.

Bottom-up parsers such as LR(1) parsers have the ability to see more of the input and shift it onto a stack before they decide how to reduce it. LR languages are a superset of the LL languages that can be parsed by recursive descent. However, LR parser generators are considerably harder to implement.


To interpret an expression like "5 * (4 + 2^2)", an evaluator must evaluate from the ( to the matching ) as one factor, before evaluating the * to multiply it by 5. Then, inside the (), the evaluator must evaluate the ^ first, before the +.

These techniques require recursion, to obey math precedence rules, and to interpret nested parenthesis. A RecursiveDescentParser reads each token inside a recursively declared filter system:

 - terms +-
  - factors */
   - exponents ^
    - parenthesis ()
     - terms recursion
    - terminal values 99.9

After finding a sought item and recursing thru the other possibles, the RDP then evaluates its intermediate terms, and returns them to the next layer up.

An example of an RDP, written via TestDrivenDevelopment, appears here:


Question: What is LR(1)? Every time I see it, I expect to find a Unix man page in section one. Perhaps we need a page about LrParsers Bottom-up parsers such as LR(1) parsers have the ability to see more of the input and shift it onto a stack before they decide how to reduce it. LR languages are a superset of the LL languages that can be parsed by recursive descent. However, LR parser generators are considerably harder to implement.

The LR(1) notation means, "LR parser with 1 token of lookahead", not a Unix man page. Grammars can be described as being X(k), meaning "can be parsed with parsing algorithm X using at most k token(s) of lookahead." LL(1), SLR(1), etc. -- ScottVokes

From Wikipedia: An LL parser parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser). An LR parser is a parser that reads input from Left to right (as it would appear if visually displayed) and produces a Rightmost derivation. The term LR(k) parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually k is 1 and the term LR parser is often intended to refer to this case. -- ChristianFriedl

See also RecursiveDescent


EditText of this page (last edited July 29, 2011) or FindPage with title or text search