A modest proposal ... or a request for pointers and info if someone's heard of this before ...
RegularExpressions are sets. Their contents are strings - also, in principle, decomposable to sets. Some regexes have a finite membership, many (most?) have an infinite membership. With me so far?
Well then why not implement the obvious set of operators for sets - intersection, union, subtraction, and most importantly subset? Why just membership (=~)?
Let's pick RubyLanguage for this because, hey, ruby's cool. Ruby regex syntax would overload, respectively, * + - and < operators [see OperatorOverloading]. I have a notion to employ < especially for a somewhat unusual form of MultiCaster ... but before reinventing wheels I thought it'd be worth finding out if anyone's heard of any such critter. -- PeterMerel.
Also worth noting that Ruby's Composable really should be based on < rather than <=> for the sake of poset representation. Yeah, I know the lineage, but still that's a sucky decision ... -- Pete.
Determining if one regex is the superset of another (i.e., determining if one regex accepts all strings accepted by another) is computationally very difficult, and may be undecideable. There's a good reason that RubyLanguage/PerlLanguage/etc. don't support those operations.
No, that is simply wrong. All the operations on regexes (or rather finite-automata) are theoretically very well understood, see e.g.,
The algorithms for negation and dissection are definitely not easy to implement, but that's really no excuse, that they are missing from most standard packages. But I bet there are libraries, that provide them. I'd really really like to use regex libraries, that support negation and intersection, because that would make some lexer much much easier to specify (example: string literals with embedded escapes). -- GunnarZarncke
But remember: RegularExpressionsArent---in other words, the "regex" feature in most languages trivially allows the constructions of automata which are more powerful than FiniteStateMachines. Many regex packages are (I think) TuringComplete, in which case the problem being discussed is most definitely undecideable.
I agree that TuringComplete regexes will kill the arithmetic suggested. But classical ones will do fine, and that's sufficient. OTOH ExtendedSetTheory seems more appropriate for the use contemplated by SymphonicArchitecture, so no worries. Thanks all for the helpful discussion. -- Pete
I'm familiar with a technique (applied to tables, actually) where the resolution of set inclusion/exclusion/intersection is a matter of run-time semantics. The set selection for term (B) is performed on the results of term (A), rather than on the entire set. I'm sure you know the drill.
It seems to me that something similar could be done with regex, so that the syntax is supported which means "resolve the first regex, then use that result as input to the next regex." The notation in abstract would look pretty normal (whatever that is), but rather than engage in trying to pre-compute what xY * xZ might mean, you'd resolve it by applying xZ(xY).
That is, if I grasp what you're asking.
If you're looking for a means to pre-compute the evaluation domain from two (or more) input regexes that seems really ambitious.
See Also: AlternativesToRegularExpressions