Encrypted Computation Edited

I reworked EncryptedComputation, but did not wish to impose the large change without review.

(last edited October 21, 2005) -> DeletionCandidate


Suppose you're working on some problem that requires a few trillion computations for a few hours. So you dialup your local supercomputer and have them run your program for you. Unfortunately for you, the local supercomputer company is owned by an industrial spy who sees what you're doing and steals all your data. What are you gonna do? The damage is already done, it's not like the law is going to fix anything.

This problem wouldn't occur if you needed a few terabytes for a few hours. You'd just encrypt all the data going in, and decrypt it as it comes out. Same thing if you just needed a few terabits of networking capacity (which is just a special case of storage).

This is not the same problem as DRM.

Context

You have an algorithm (and appropriate inputs), and wish to compute the outputs. However, you want to execute the computation in an untrusted environment without revealing the algorithm.

Note that you may or may not care to conceal the inputs/outputs.

Compare this with wishing to transmit or store data in an untrusted environment without revealing the data: note the lack of computation in the hostile environment.

Compare this with DRM, which tries to limit the use/propagation of the data to a specific context (e.g. a paying user). Note that the unencrypted data must become present in the untrusted environment, and the algorithm may not need to be secret (e.g. RSA public-key).

Forces

Computation Required
You wish to perform a computation, not just store/transmit data.
Controlled Environment Unacceptable
Even though you might control trusted environments, they are unacceptable for performing the calculation. For example, you wish to run a nuclear explosion simulation requiring a massive distributed network and you only control 1000 computers.
Hostile Environment
The environment, typically a remote server, is suspected of being untrustworthy to some degree. For example, an enemy may be able to observe program execution, or even be able to substitute for your algorithm.
Secret Algorithm
You wish to keep your algorithm secret. For example, it contains your business rules.
Proof of Intended Computation
You need to be able to prove that the correct algorithm was used, and specifically that an enemy did not replace, subvert, or sabotage your algorithm.
COTS Environment
You are limited to using off-the-shelf environments. In other words, you intend to use a typical computer.
Meaningful Use
You may wish to allow a (possibly hostile) third party to use your algorithm (e.g. a word processor), or you may wish to reserve meaningful use to yourself (e.g. compute your payroll).

It appears that you want to transform your algorithm into a different algorithm that hides its meaning, and marks its outputs in such a way that any alteration to the inputs/algorithm/outputs is detectable. The encrypted algorithm should be impossible to decrypt (for some definition of impossible).

Note that we care whether a human can understand the protected algorithm, since we are concerned with keeping an algorithm's method secret.

Note that this problem already exists today. Software companies routinely encrypt their software before selling it to users, to protect their algorithmic secrets, and prevent derivative works. Unfortunately, their encryption scheme is broken since it relies on security through obfuscation. It's easy enough to run either machine code or ByteCode through a decompiler.

Execution Models

Compilation to binary, while obfuscating the code, is (usually) not intended as a security technique. (And as a security technique, it's a poor one). Running the code through an obfuscater helps; but it also is a rather poor security technique.

In most cases, the purpose of compiling to native rather than shipping source is simply that native object files run faster, and most users aren't sophisticated enough to compile from source. In the Windows world, most users don't even have compilers.


What we need is an encrypted machine language. We need a way to encrypt a program so that the machine that executes does not reveal the purpose of what it was executing. The program could never be decompiled without the secret key.

Many believe that this isn't possible. If the VM or runtime is to execute the program, it has to know what it does. Enough of the program semantics must be exposed to that the program can run; otherwise it's not useful to anyone.

There are two domains of "know what it does". One, which we are concerned about, is meaning-to-a-human. In other words, can my secret ever be understood and exploited? The second domain is a form of closure concerning the execution of the algorithm: the machine "knowing" what it is doing. Clearly, the VM has to be able to execute the protected algorithm, even using such meaningful steps as "add r1,r2". The protected algorithm uses the domain of normal operations, in a correct manner, however, analysis of the operations should not reveal human-meaningful information.

Thus, we aren't turning the operations into gibberish, we are turning the algorithm into (human) gibberish. A weak form of this may be exemplified by the various obfuscated-programming contests.

Saying that "the CPU doesn't know what it does" is just a fanciful, and now admittedly bad, way of saying that a human with total information about the instructions performed (at the site of the CPU) still can't figure out what the program actually does, in any meaningful manner. The key point here is in any meaningful manner. What the CPU actually does with the instructions, we couldn't care less about. What we care about is what it means.

OK, I think I see what you're arguing for now. If CPU1 executes a program producing encrypted results and CPU2 decrypts those results then anyone with access only to CPU1 could be prevented from decompiling the program. Is that close? This isn't what people normally mean by encrypted computation.

Close? You're dead on. And I wasn't aware anyone else used the term 'encrypted computation' though you're right that most people working near this area work on DRM-like schemes.

Q: Therefore, can we rewrite an algorithm such that all human meaning is hidden, yet it still executes (on the same machine) and performs the same calculation?

A: We know this is true to some extent, as shown by obfuscated-programming contests.

Q: More strongly, can we prove that the original algorithm is unrecoverable by the enemy (for some definition of unrecoverable)?

A: At least some algorithms can be encrypted, apparently with the same guarantees as we expect for document encryption. See Solutions, below.

This is analogous to data encryption. A document, encoded in a character-set, is rewritten into the same character-set, yet the original information is hidden. Thus, we still have closure over the character-set (we can store/transmit the document anywhere we could the original), yet the meaning-to-a-human is gone.

There is a precedent in encryption; it is possible to encrypt a document so that someone may sign it. Then, when the document is decrypted, the person's signature is still visible despite their never having known anything about its contents.


If anything, you need either an encrypted bytecode and an appropriate virtual machine or an encrypted native code and correspondingly an appropriate chip.

The whole idea is to not trust the chip, nor the virtual machine because these things can always be under the control of a suspect third party. So we don't want an "appropriate virtual machine" nor an "appropriate chip"; that is the exact opposite of what we want. But it might require a special language.

The approach the industry seems to be headed (look at Palladium, the X-box, and other "trusted" environments), is exactly this one--encrypt the executable so that it looks like gibberish to a hacker, and then have a trusted "kernel" which can expose the decrypted executable to the underlying machine, but doesn't make it available to the aforementioned hacker.

There are many problems with this approach, of course. The adversary may have at his disposal things like in-circuit emulators, logic analyzers, and the like--in order to recover program fragments that are plaintext.

It should also be mentioned that there are public policy issues to consider, such as the longstanding right to reverse-engineer

The DMCA takes care of that little pesky problem.


The program could never be decompiled without the secret key.

Then the program couldn't be executed without the secret key, either. Therefore anyone with the authority to execute a program must also have authority to read the secret key. Computation can't be securely encrypted.

I can't "conceive" of how a CPU can execute an instruction it can't read. If you can, please explain.

Oh, it would be able to read the instruction and be able to perform it, but it wouldn't be able to make sense of the results. The compiler would diffuse the information about what each instruction does in the greater scheme of things across all of the instructions, and it would inject noise into the program just for the fun of it, with the result that the program can't be decompiled without knowing some secret key. This is how encryption works. It's the same with documents; you can always read an encrypted document, it just isn't meaningful.

Injecting noise only increases obscurity. The compiler can't "diffuse" information about what each instruction does.

But, it diffuses information about the original instruction does, in a way which seems analogous to data encryption.

In fact, doesn't the encryption of the algorithm have to satisfy at-least the criteria for document encryption? Of course, it also has to allow us to perform the desired calculation.


It might be possible if the first thing the program did was hash the virtual machine it's being run on. This would establish that the virtual machine was trusted. Then this problem reduces to making the virtual machine secure.

Unfortunately, not. The program can always be modified to make it run on an insecure virtual machine.


To create a DRM scheme on top of encrypted computation, you'd need the additional property that the results of the program can be decrypted with a key that can't decrypt the program itself.

One difference between encrypted computation and DRM is that with DRM the key to decrypting the results must be different from the key to decrypt the program. Another is that the key to decrypting the inputs must be different from both of these other keys. An encrypted computation scheme where the owner of the program must encrypt the inputs and decrypt the outputs would still be useful, just useless as DRM.


The result of encrypting a program should look like a random program that produces garbage, with no way to prove that it does anything else except for the fact that it doesn't crash. Ideally, without knowing the secret key, you could "decompile" the encrypted program to several billion different plaintext programs that do vastly different things with most of them producing garbage. So you could execute a program and have no idea whether it's a text editor or a physics simulation or even just garbage.

That works only if the user doesn't use the program. At some point they have to find out if it's a text editor or it really is garbage. Once they find out it's a text editor they have all of the information they need to recreate the source code.

This highlights a force that was not specified: meaningful use must be restricted to yourself (or not). This force probably changes things considerably, as we see in the above comments. Do you want an untrusted user to use the results (e.g. run the word-processor on their own documents)? Or, do you want to use the untrusted resource and reveal nothing (e.g. restrict meaningful use to yourself)?

It seems likely that allowing use of the algorithm by the enemy may allow them to extract some secrets in the algorithm. By definition, you aren't denying them meaningful use. But, you may still deny them the ability to know the complete algorithm (e.g. "what are the effects of this particular atomic-explosion?' vs. "how do you calculate the effects of an atomic explosion?").

Again, merely protecting the algorithm is not sufficient to implement DRM, since DRM wants to control the use of the data (not necessarily the algorithm).


Imagine that a printing press is a degenerate CPU. When it reads 'a' it prints 'a', and so on for all the characters in its instruction set. Now, you can encrypt a document and the printing press knows how to perform (print) it without ever having to decrypt it. It's just that the result of this computation is meaningless.

Finally, the printing press example might lead some people to think that encrypted computation is possible but the printing press is actually operating under a simple known program (copy this character) and an unknown stream of data. The point of the example isn't to support the idea of encrypted computation, but merely to support the conception of the idea of encrypted computation.


Solutions

There appears to be no general solution, however there are solutions for limited cases.


See http://bitconjurer.org/wild_cryptographic_conjectures.html

The actual references:

One of the weaknesses of the paper is that they don't consider relaxing the Functional condition. They're focused on DRM schemes so they can't afford to relax it but it can be relaxed.


HomomorphicEncryption?
Appears to apply to the question something like "can I encrypt the inputs, compute, and decrypt the result using the same encryption function?":

...Fegenbaum and Merritt asked: "Is there an encryption function E() such that both E(x+y) and E(xy) are easy to compute from E(x) and E(y)?" They were asking explicitly for so called algebraically homomorphic encryption schemes. Unfortunately, there has been little progress made in determining whether such encryption schemes exist that are efficient and secure, although it is one of the crucial open problems in cryptography. -- Homomorphic Cryptosystems and their Applications (D? Rappe, Ph.D. Thesis, University of Dortmund, Germany, August 2004), http://www.rappe.de/doerte/Diss.pdf

HomomorphicEncryption? sounds like an idea I had some time ago about using public search engines to index private documents: The content would be encrypted on a word-by-word basis. After indexing, you could simply use the public search engine interface to search for encrypted keywords. But even if the encryption scheme is known to be secure, I wonder if an adversary could deduce some information by statistical means. What do you think? -- AntonioTaylor?

Ummm, I can't see the correspondence. You're describing a common storage encryption scheme subject to a common type of attack, just in a novel setting.

As for attacks? Well, most encryption schemes have problems with short strings like individual words. And if each block is known to be one of the few thousand words in the English language, that causes a big problem. But if one word maps to any number of blocks (due to the injection of noise) then you should be completely safe.

I regard it as a possibly computation(a database query) that is not uncommon to outsource, secured against the party computing the result: a form of EncryptedComputation. If I was only concerned about encryption, I would encrypt blocks of words rather than individual words, but then the other party would not be able to compute the desired result. -- AntonioTaylor?

I wish you were right, but the relation

  decrypt(encrypt(operation(A, B)) == de(op(en(A), en(B))

isn't preserved. In order to make the encryption scheme secure, and to prevent building a dictionary, it's necessary to inject noise into the blocks so that encrypt(A) ~= encrypt(A). If you preserve this relation, and you limit your plaintext to short strings, and you limit them to words, well then it becomes possible to use statistical attacks to build up a dictionary of words to their encrypted counterparts. Which doesn't break the encryption scheme but it does completely bypass it.

That's exactly my point. If I'm encrypting word-by-word, the relation holds true for the database queries, but an adversary could deduce information. But you reminded me of the possibility of inserting noise words between the actual data. That would still allow simple queries on the encrypted data, while phrase queries would have to be transformed or might become impossible, depending on the existence of wildcards in the query language. It would also make statistical analysis more difficult, but not impossible. I suppose bypassing the encryption scheme this way is not only possible in this example, but is the crucial thing that makes EncryptedComputation theoretically worthless. -- AntonioTaylor?


Consequences

Note that having encrypted computation (inputs/outputs in the clear, e.g. meant to be used by third-parties) would help both the free software cause and the secure computing cause. Running commercial software which performs operations its users can never know would force users to wake the hell up. It would finally put commercial software in the same category as computer viruses (completely untrusted) and force genuine security.

The algorithm would probably run less efficiently.

As of ~2004, only certain algorithms can be encrypted, and only in certain ways (e.g. HomomorphicEncryption?).


CategorySecurity


EditText of this page (last edited July 5, 2006) or FindPage with title or text search