Testing Regular Expressions

At a recent meeting of the TorontoXpUsersGroup, the subject of regular expressions came up.

The original chunk of code that sparked the discussion had a method with several conditional RegularExpressions in it. Each regular expression had a one line comment above it explaining what it did.

A suggestion was made that all these regular expressions be refactored into methods with appropriately descriptive names. This would remove the need for the comments and would also make it possible to write test cases for each individual regular expression.

The downside to this suggestion would be that if the code had a reasonably large number of regular expressions then the number of tiny methods would increase dramatically. I'm generally in favour of smaller methods that do a single thing but for some reason that I can't really explain, the thought of splitting regular expressions out such that there was one method per regular expressions each seemed excessive.

Now that I've thought about it a bit, I think it's worth trying for a while to see if it's actually workable. I really like the idea of being able to write individual test cases for my regular expressions. Perhaps not all of them, but at least the complex ones.

FWIW the code in question was written in RubyLanguage although this applies to any language that has RegularExpression support.

-- MikeBowler


I think that there is one more reason why it is a good idea to refactor (ExtractMethod) in this way. I am no expert in regular expressions, but I noticed that each of the regular expressions that were in the code looked very similar. Almost as though the lines were cut and paste into place and then modified slightly.

I would expect that there would be enough in common between each of the methods that are extracted out to enable reuse. This, to me is one of the main reasons why ExtractMethod (and SmallMethods? for that matter) are very useful in code. -- MonteDenby


If the code actually had a similarity such as this then it would be a good idea to refactor them into a method but this was not actually the case. Each one was very distinct in operation and structure. -- BryanZarnett


It would be very unlikely that the any of the regular expressions would be reusable in the way MonteDenby suggests as each one is very different. The only advantages to using ExtractMethod in this case would be

-- MikeBowler


Nowadays, my code tends to be LotsOfShortMethods. I never used to code that way, but after reading RefactoringImprovingTheDesignOfExistingCode, my coding style has changed quite a bit. I'm getting comfortable with the LotsOfShortMethods style, and starting to like it a lot.


Some sample (Perl) code:

 # Replace all triple quotes with bold
 s,'{3}(.+?)'{3},<b>$1</b>,g

# Replace all double quotes with italics s,'{2}(.+?)'{2},<em>$1</em>,g


CategoryRefactoring


EditText of this page (last edited March 11, 2006) or FindPage with title or text search