Traits Of Good Scientific Evidence

Some people believe that good scientific evidence does the following:

Defines the metrics to be used before the experiment takes place.
GoodMetricsUseNumbers. Qualitative metrics are often difficult to verify and summarize.
List the weights of each metric up front and the source and reason for the weights.
Measure the outputs instead of the internal design. For example, a rocket may be based on wonderful and clever theory, but the ultimate test is its ability to hit the target with given/expected resources (such as fuel).
Use a variety of metrics if possible. For example, if you are measuring "car safety", use a variety of speeds and angles of crash tests if possible, not just say a front-on crash test at 30 mph. Similarly, if you are measuring code volume, "lines" may not be sufficient because it is easy to "game" (WaterbedTheory). Thus, number of tokens, number of statements, number of blocks, numbers of named blocks (functions and methods), number of characters, number of contiguous segments, etc. can be used.

Note there are metrics not in this list that may still be important. However, they may be resistant to scientific analysis. (See SovietShoeFactoryPrinciple.)

Some people believe that Top's little summary is wrong.

Qualitative metrics are a significant part of science - if you need numbers, use 0 and 1. Metrics should be defined, of course - and formally so.

Finally, internal design is extremely important in science - indeed, figuring out how the internal design produces the outputs, and how to predict new outputs for new internal designs, is much of what science is about. TopMind may be confusing science and engineering. Engineering takes what science has already proven and creates an internal design with the goal of obtaining a particular output. Pure science never cares about obtaining a particular output (albeit, science is rarely pure). Besides, even in engineering, supposing one is comparing two systems that achieve the same ends, what matters in any comparison are the other properties each solution achieves over the other - relative expense, performance, safeties, adaptability to other missions, etc. It is quite possible for, of two systems that achieve the same goal, for one to objectively and measurably be a better solution than the other.

And regarding the "list of weights": There shouldn't EVER be arbitrary weights on metrics; the idea of joining a bunch of numbers together to get one summary number is fine for making summaries, but summaries are not scientific evidence. Attempting to create a formula for making predictions is a separate endeavor from any measurement-taking activities, and should be based firmly in making predictions within one dataset and testing against another (possibly one collected in the future). A formula can be scientific as one can measure the accuracy and precision of the formula against many future observations and datasets. But formulas are not 'weighted metrics'; they are functions over a model.

In an ideal world, yes. But, in practice most technology requires some kind of weighting when comparing in order to connect it to human needs. For example, cars have fuel efficiency, acceleration, cost, etc. as common factors. Rockets have reliability (failure rate), fuel efficiency (lift per fuel unit), accuracy, cost, etc. --top

Cars have fuel efficiency and such - those are measurable traits. But you can't call scientific adding "3xFuel Efficiency + Acceleration / (2 m/s^2) - Cost / $1000 + ...", or placing these in a priority order without a particular mission, etc. - There is no science behind weights. If all you mean is that there are several factors to examine, there aren't any issues of weights. If you wish to qualify a rocket or car for a particular mission (e.g. "Expected Total Cost of Ownership over 5 years given regular maintenance and fueling for estimated 200 miles per week") you can perform such qualifications. But that still isn't weights. Perhaps your meaning by 'weights' is something different than I'm hearing?

The reader or results-user can change the weights if they want as long as they have full access to the base info. For example, a manned space mission would weigh reliability of the rocket higher than a customer for unmanned launches.

But if you are charged with providing objective measurements, then usually you will need to apply such weights. For example, the Federal government was tasked with providing a fuel efficiency rating and a safety rating for automobiles. They had to assign weights to the various tests in order to produce a single final score. I agree it is not the ideal approach, but we have nothing better to replace it with so far. Perhaps we should rename the topic to "TraitsOfTheBestPossibleScience?".

Summary != Science

The summary is to make it easier for somebody to change the weights to fit their requirements (user needs). It is not the total package.

Internal design? What might that BuzzPhrase mean anyway?

I'm fairly certain it refers to the innards of a black-box product that accomplishes a particular mission or service. The 'external' design would then be the service or purpose to which it is applied. TopMind uses the word relative to software, but it would also apply, e.g., to transformers (black box that changes voltage for amperage) and bridge construction (providing a service to carry vehicles across otherwise impassable terrain). One can call it 'external design' because ultimately that black box fits into some grander design. What external design is NOT talking about is physical shape; that's part of 'internal' design, unless aesthetics are part of the service.

As opposed to external design? All designs must take into account external and internal factors. A rocket ship must account for winds, gravity (external factors). And who doesn't take into account these external factors? People with opinions, who don't use science, but go based on gut feelings.. possibly. Forget gravity! Gravity is just an internal thing in the earth's core! What happens when we really put a rocket against gravity? Hah! let's send this rocket up based on my personal opinion of what will happen (since I don't believe in gravity) and let's throw 10 million dollars down the drain if the mission fails (instead of first proving it internally, whatever internally means).'

Also, it is quite obvious that a database must take into account what users will be selecting from the database (outer factors) which is extactly what internal design also studies - since internal and external are not somehow separate (in fact, I don't even know what the phrases internal design and external design really mean on this page. It seems like an inflammatory hidden insult toward math, science, and theory, without using a BadWord. All scientists with an IQ over 6 realize that internal factors are not separate from external factors (again, what the vague term internal design means, I have no idea).

Sounds like internal design is a coined BuzzPhrase to describe anyone who uses math, science, and paper to help them predict results. (Oh, bad on them. How could they be so naive.). Well, rockets don't always have the chance to fly a 200 million dollar test case here and there, and missions thankfully can be proven on paper on the cheap (favorite phrase) and/or with emulators first. Yes, using science, math, and research, we can save a lot of money and save us from costly errors - whether you like it or not.

The theory that the moon was not flat, and that it was round, was not proven externally (some guy traveling to the moon and checking).. rather it was most likely proven internally in the person's telescope (what does internal mean again? Oh that's right it is a buzz phrase and I can twist it to mean whatever I want, including the mirror in some person's telescope).

This is because once telescopes were readily available to those in power, they could see for themselves that the moon was round. However, if there was still a dispute even after that, then numeric metrics could be applied, like the elliptic ratio (better word?) and angle of ellipticality of the craters, which change closer to the edges. If both of these numerics match a model (math or miniature) of round craters viewed at an angle due to being on a sphere, that lends credibility to the round hypothesis. This would put the ball into the flat court to find an explanation for the elliptic shape and angle of craters at the edges. Just because something didn't need to be solved with metrics does not mean it can't.

Outer factors such as winds, gravity, etc obviously affect the rocket mission (which are not part of the internal design but are outer scientific factors). What is your point? Rocket failures, usually have nothing to do with science theory failing - rather humans are usually the cause of rocket mission failure (and humans are an outer factor, not part of the internal scientifically proven design. If humans failed, then they didn't use enough science). Failures are usually due to some stupid software error such as centimeters being sent in when it expected millimeters (which a strong type systems might help with). Internal errors such as software errors are problems from external sources (a human's fingers). So I'm still not sure what your point is. If we can prove that sending a 200 million dollar rocket to space will fail in our emulator using science - please do. Don't send it to some outer planet and waste 200 million on an outer test case, until it has rigorously been proven on the cheap first (science, math, paper, are all cheap). If we have appropriate measures to test first, we should. Prove it before sending it into the real external world, which could cause a 200 million dollar loss.

Pre-flight testing is often done with simulations, which are a cheaper, but imperfect, stand-in for reality. Simulations for software engineering are more difficult because our models of the human mind are far less accurate than our physical models, and vary per mind.

Also, I am skeptical of "[rocket] Failures are usually due to some stupid software error". Many are, but I doubt most. (Related: QA article cited in SpaceShuttle.)

You are skeptical.. well, I have never once heard of a rocket that crashed due to science or math. It is usually always a failure of the human doing something wrong. Please provide references where there are rocket crashes due to science and math failing, or whatever it is you think are the cause of rocket mission failures. It is exactly why Ada and such languages are recommended - since they prevent dumb errors (and more complex errors too) and support strong type checking, contract programming (SPARK), integrity checks, etc.)

It was either Mariner I or Mariner III that crashed due to a Fortran coding error. I do agree that a strong-typed language is better for expensive and life-threatening missions. However, that does not necessarily mean they are good for every domain. PickTheRightToolForTheJob.

But a fortran coding error is a human error. Either the human coded something wrong, or the fortran compiler generated incorrect code since the compiler was created by a human which made a mistake.

Maybe I am not understanding the question.