System Size Metrics

Issues in measuring or defining "large systems" and/or "large applications"

Is a 200-table, billion-record Oracle database which is attached to 70 small MS-Access applications "big" or "small"? Do we measure the EXE sizes?, the database?, total data? user count? In most shops I have worked at, almost everything is interrelated by various degrees such that the boundary to "system" or "application" can be a fuzzy philosophical debate that never ends.

Some design philosophies will split things up into smaller but more "applications", and others will have bigger but fewer "applications", even though the total users and "screens" are about the same. The concept of "application" is even fuzzier WRT web applications as the interlinks can make it look like (or be) one big interlinked "web" of forms and screens. I am beginning to think that "application" is a rather artificial boundary imposed by technology restrictions or habits of the early 90's.

Measurement candidates:

AbcMetric
LinesOfCode
FunctionPoints
Number of tables in database
Number of total records in database
Number of total users
Size of EXE file
Number of "screens" or forms
Number of web scripts
Number of developers
Number of UseCases
Number of user transactions per unit time
Number of pages of help files needed to understand it
NumberOfKeystrokes

I think the problem is that the measurement is dependent upon the intended use. Without a clear statement of what one is trying to determine, it is impossible to identify what measure to use.

Some metrics are dictated by the problem domain, others by the nature of the solution. As a first approximation, the size of the problem domain is fixed; for instance, you cannot change the number of customer records in a database, it is dictated by the nature of the problem. But things like the number of lines of code are apparently dictated only by our own cleverness (and the power of the language and environment we work with).

So let's split that list:

Size dictated by problem domain (ProblemSizeMetrics?):
- Number of tables in database (when following best practices)
- Number of total records in database
- Number of total users
- FunctionPoints
- Number of UseCases
- Number of user transactions per unit time
- Number of pages of help files needed to understand it (when OneToOne with the UseCases)
- NumberOfKeystrokes
Size dictated by our cleverness/language/OS/etc (SilverBulletSizeMetrics?):
- AbcMetric
- LinesOfCode
- Size of EXE file
- Number of "screens" or forms
- Number of web scripts
- Number of developers

Supposed SilverBullets? obviously should try to address SilverBulletSizeMetrics?, but reducing ProblemSizeMetrics? simply introduces flaws, by definition. Unless we're solving the wrong problem, of course. -- DougMerritt

Re: "Number of UseCases", that assumes you have the design documents. I kinda had in mind stuff that you can tell by looking at the existing system rather than assuming original design info is available (which it often is not).

But even in the absence of formal documentation, one must informally be aware of UseCases, or else you won't understand how the system does/should interact. So even in the worst case it should be possible to estimate of the number of UseCases, and hence have that as an estimated metric.

My own interesting idea - take all the data you want to include in your measurement, and compress it into a single file. The size of the compressed file is your metric. The compression tends to reduce the space taken by repeated data, so you are left with something approximately representing the essential size. -- DougKing

I am not sure what you mean. It seems it would be more influenced by the number of measurements taken rather than what is being measured.

By "all the data you want to include in your measurement", I mean whatever you consider to be your source code. So, take your source code tree, and compress it, with the likes of WinZip? or tar/gzip/bzip. Repeated elements will be reduced in size. e.g., a long variable name will not take up significantly more space than a short one. Repeated sections of CopyAndPaste code should also not count for as much. The impact of languages with long keywords and other fluff should also be reduced.

Note: I have never really tried this, it just makes sense to me. It's almost as easy as counting newlines, and I think it gives a better idea about the size of the project (I really don't know how good it is, but it can't be any worse than LinesOfCode ). I'm kind of fishing for comments or other thoughts about this here. -- DougKing

I really don't see the value of checking the size of the code zipped up. Seems like a bother, you'll have a zip file that needs to be freshened with each change. The value with LinesOfCode is it is snap easy to count the lines. Just have a program count the size. Only downside is that your coding style must remain constant.

That's kinda the point. What a useless metric. Non-Commentary Lines of Code, ick. Anyway, if the creation of a ZIP file is part of the automated checkin procedure then it can give you those kinds of statistics. Whatever.

I am kind of warming up to the idea. It reduces issues of long single lines versus short multiple lines, for example.

Seems like using a ZIP would make the most difference for a project that had lots of repeated lines - that is, lots of CopyAndPasteProgramming, which is either informative or deceptive depending on your perspective. When we say "large project" what are we thinking about, anyway? The number of lines of code and time necessary to change or add a feature? Or simply the number of things it does? -- francis

Hmm. Maybe the difference in size between the raw and zipped files could be seen as some sort of software quality metric. -- Richard Rapp

I think the problem is that the measurement is dependent upon the intended use [of the measurement]. Without a clear statement of what one is trying to determine, it is impossible to identify what measure to use.

I think there is more knowledge in this one statement from above than all of the previous discussion.

I am not sure how "intended use" solves or simplifies any thing. There may be many users and uses. It is a matter of where you draw lines around the parts. I don't think there is any one right answer. The best one can do is create a UsefulLie for a given moment. It is kind of like Southern California; population sprawls all over the place. Where one "town" starts and another ends is a line that somebody draws just because we need names and political districts in order to communicate and assign human responsibilities. The borders may not mean much to shoppers, tourists, etc. Software tends to be divided by the organizational structure of the companies, but there is still often a lot of overlap and cross-over.

I believe the above reinforces the "intended use" statement rather than refute it. The borders are there "in order to communicate and assign human responsibilities," but "may not mean much to shoppers, tourists, etc." The borders are only meaningful in terms of their intended use. Likewise, a software size metric is dependent upon its intended use. One simply needs a different measure if one is concerned with scheduling and budgeting than if one is concerned with available ROM or RAM space.

Perhaps there is some confusion over "intended use of system" versus "intended use of metric". [Thanks, this is a good clarification. I modified the original statement to make this clear.]

"Application" tends to be determined by human work divisions rather than by computers. For example, HR and Payroll may use some of the same databases, but each will have different "applications" for the most part. A payroll employee should generally not know or care about an employee's work history or performance reviews.

But size doesn't matter. ;)

Then why did Mr. Bobbit spend all that money for repair surgery?

See: DivideAndConquer, DynamicLanguagesAndLargeApps

CategoryMetrics, CategoryEnterpriseComputingConcerns, CategoryScaling