How Would Refactoring Go From Indentation To Parsing

HowWouldRefactoringGoFromIndentationToParsing

This is a bit of a challenge. I here describe a program transformation that I have performed many times, which is nonetheless often painful. I do not see clearly how refactoring would lead to what experience seems to show is the "right" solution.

The task: Write a tool that filters a C++ language source file, and generates another file from it. For example, one that extracts a class interface .h file from a class definition that contains method implementations.

The initial code will probably rely on your coding standard:

  string line;
  while( (line = cin.getline()) != EOF  ) {
if( indentation(line) == 0 && firstword(line) == "class" ) {
cout << line
}
else if( indentation(line) == METHOD_INDENT ) {
cout << transform("s/{.*/;/",line);
}
...
  }

Eventually, you will want to make it work for code that does not meet your indentation standards. You tweak the rules a bit, but eventually you want to use a full language parser.

Q: how does incremental refactoring get you over the hump to fully parsing the language?

The above code looks vaguely like

  while(...) {
if( looking_at_class() ) { 
...
}
else if( looking_at_method() ) {
...
}

but I am aware of no compiler that works like this. Instead, you have to invert control, and write your code in an event driven form that is called by a framework, such as YACC/BISON. I do not see how incremental refactoring accomplishes this - or, perhaps, there should be a refactoring that accomplishes this in our catalog. (Similar transformations are used in media streaming (the so called "Word Filter") and in many other areas.)

Finally, even if you jump from using an indentation based scanner to a parser fully aware of the language, at some point maintenance becomes awkward (especially for a language like C++, with changing syntax at some points in its history), so you try to integrate your parser with that of a compiler, factoring out the common stuff. Good luck maintaining that: the compiler jobs will probably factor you out within a year. So you're left with duplicate code for language parsing, that needs to track the language as best you can, with the least possible maintenance.


Don't forget refactoring only does local optimizations that get you to the local maximum. You have to make a stab at the solution first. In that case, use the right tool for the job. Program in BNF not C++. To write a parser, write a parser. Or, more of a sound bite, refactoring is NoSilverBullet.


My try:

1. Develop a solution that uses the code conventions.

2. Use a standard prettifier to format the code (ident comes to my mind).

3. Chain solutions number 1. and 2.

Ja! That's easy! PickTheRightToolForTheJob!


Looks relevant: BigRefactorings BigRefactoringsAreHard BiggerRefactoringThoughts


It seems to me that you are asking a question at the wrong level. If the job is to parse the file, you need to start with a parser. Sure, you can DTSTTCPW and if you don't need to go further, you're fine. But once you decide to try for something independent of the formatting, you jump straight to needing a real parser.


Refactoring improves the design of existing code without altering its behavior. If you need to add new behavior or change existing behavior you don't refactor. You write a failing unit test. -- EricHodges

CategoryRefactoring


EditText of this page (last edited November 11, 2005) or FindPage with title or text search