CacheCoherency: When two caches have agree on what the values of variables (or memory adresses) are.
This does not mean that HardwareCaches will contain exactly the same data, but that whenever asked about data will return the same value. This can be achieved in several ways, the fisrt that occurs to me is to invalidate each others caches when a common variable or address is changed.
ParallelComputers?, which by definition have more than one CPU, have to implement CacheCoherence? at the hardware level. Some say this is the very reason why ParallelComputing? has never achieved major breakthroughs: because they are always slower than having similar single CPU machines.
Your phrasing in that last sentence seems to say that parallel processors are slower than a single processor, which probably is not what you meant.
The speedup that can be obtained from N processors is typically less than a factor of N, so the speedup is sub-linear. In some cases, however, the speedup is approximately linear, so it's not as if parallel processors are a failure. 4-way Pentium systems have been common for low end servers for years, and larger numbers of processors are common in e.g. high end Sun servers. Parallel systems do speed things up -- just not as much as we would like.
I don't think I would say "never achieved major breakthroughs", because there are lots of examples that could be claimed to have been breakthroughs (including the famous Illiac 4 from the 1960s), so you should probably rephrase that part, too.
Also, there are many kinds of approaches to parallel processing, only some of which use shared memory or otherwise need to have cache coherency, so you might want to be more specific about which kinds you're addressing, just to avoid nitpicks. Presumably it would suit your purposes just to say "shared memory multiprocessing systems" or something similar.