OrphanPage 2004-05-31, last edit 2000-02-21
I've just finished an ECMA(Java)Script Validator here at XpEdinburgh. I thought I would describe it here with some of the issues it raised for me.
First the stories:
1) Ensure a given piece of text is a syntactically valid ECMA Script program.
2) Perform checking to ensure that method names are correct.
The user provided 6 sample syntax error which they wanted picked up - these formed the functional tests for story 1.
First a spike was performed to generate an end to end solution. The simplest valid and invalid programs were used to test this. (Valid ';', Invalid '{')
The next stage was to try to break the story down into smaller tasks. This is where some implementation bias crept in. The tasks were tokenise input, parse statements and parse expressions. Now I really wish I had fought the fear factor and just let the design guide itself to see what an extreme parser would look like, but I guess that will just have to wait for the next time.
Tokenise input was choosen as the first task. During this (and all subsequent tasks) the ECMA Script spec provided the break down for test cases. Each production in the spec formed a class and set of test cases.
I found this very usefull as it certainly built confidence early. All the simple atomic productions can be thoroughly tested, and I can use that confidence to focus the tests at a higher level. For example, when testing an argument list I just have to test that brackets and commas are picked up correcly, I rely on the previous testing of valid and invalid identifiers to avoid retesting the same things twice.
However, this "trimming" of the possible test cases may have implications for how severe any refactoring could be. This is because the level of testing is biased by the implementation. But, I'm not sure how this can be avoided without cascading all the test cases up.
After a few productions had been written, I found that the test hierarchy could be refactored along with the implementation. This made subsequent tests easier and quicker to write, and so more were written - more confidence - go faster.
I found spiking a solution at the start of each task helped to keep the visible functionality high. For example, the top level which validates the whole program was initially hard coded with one particular result - a ';' is OK everything else fails. When the task for statements was started, the program validator was changed to use and iteration of statements. Initially the only statement supported was ';'. Extract class refactoring. As soon as a new statement was finsihed the top level validator functionallity was increased.
Anyway, to sum up the first stort took about 2 weeks. The question is would it have been quicker using lexx and yacc? Would I have has as much control over the error messages?
However, the best lesson I gleaned from this was learned when it was time to start the second story. After implementing the first story, getting to know the ECMA script spec, it became clear that the second story could only be implemented with a full scripting engine. (The type of a variable can only be determined at runtime - so there is no way to check whether a method is valid or not until then).
So went back to the customer explained and explained this. They said they were happy to leave it out. But I had only spent 2 weeks, I had implemented the simplest validator I could for the first story. So no wasted effort, customer gets a useful system quickly - everyone is happy!
Any comments or questions would be welcomed.