Parsing Html With Regex

AntiPattern Name: Parsing HTML with regular expressions

Type: Development

Problem: Extracting information from HTML text, such as elements or groups of elements with a specific property.

Context: Programmer understands how to use regular expressions and sees that HTML looks fairly regular and easy to parse using this tool.

Causes:

Supposed Solution: Write a half-baked regular expression that works for specific test cases.

Resulting Context: Regular expression may work for most cases, but fails on oddly structured but valid HTML or broken HTML, both of which are all-too-common on the Wild Wild Web.

Common Resulting problems:

RefactoredSolution?: Find a data source that was designed to be used as a data source. Alternatively, use an existing parser of the correct complexity.


See also: ParsingHtml, http://stackoverflow.com/a/1732454/699224 for another take on this anti-pattern


CategoryDevelopmentAntiPattern


EditText of this page (last edited December 28, 2012) or FindPage with title or text search