Regular Expressions Arent

RegularExpressions as implemented by many regex packages are often not regular languages, but instead as powerful as any context free language (even if this is not necessarily easy to use).

This means, that RegularExpressionSetArithmetic is ironically not implementable in these more-powerful packages (or at least not efficiently so).


The following discussion originally stated this, but seems to have a focus on perl and mixes up NFA and context free.

Regular Expressions Are Not:


What can and cannot be handled

What it means is this: So-called "NFA engines" do not use non-deterministic finite autonoma in their implementation (indeed, as we cannot build a NFA). They use computational models that approach the power of TuringMachines and as such can handle things that a true regex cannot handle.

The following grammars cannot be handled by a true regular expression: EssExpression ::= Atom | ( EssExpression* )

  EqualLengthAbc ::=  A ^n B ^n C ^n

In other words, some number of A followed by an equal number of B and an equal number of C. That grammar is context-sensitive, and requires a linearly-bounded TuringMachine to parse.


What PerlLanguage handles easily

PerlLanguage, and the similar regex syntax found in many other languages, can handle these easily. The engines that they use are full backtracking TuringMachines.

Plus, arbitrary Perl code can be specified for the actions of a Perl regex; and Perl is obviously TuringComplete.


More on Context-Free

Your explanation is kind of wandering all over the map. If you add to regular expressions labelled rules and the ability to reference them within rules, then you get extended BNF, which of course is context free.


MuddlingAround?

Chomsky's 4 levels of power are not this complicated. Saying "regular expressions aren't" just muddles the subject.

Also your terminology is incorrect: "NFA engines" do not use non-deterministic finite autonoma in their implementation (indeed, as we cannot build a NFA). No - you are mixing up the term "non-deterministic" from other areas of math. In the mathematical theory of grammars, it is completely standard to talk about "non-deterministic finite automata", and NFA is simply an abbreviation for that!

The comment you responded to was a complaint I was making some months back about wording. I understand the topic. The wording at the top of the page is muddled.


See also: AlternativesToRegularExpressions


CategoryRegularExpressions


EditText of this page (last edited November 8, 2006) or FindPage with title or text search