Omega: Here are the rules, make your choice. Decision Theory: makes the choice. Omega: Actually, I lied. You get the opposite of what I told you, so now you have lost.
Obviously, from the set of decision theories assuming that Omega never lies, the better decision theory gets worse results in this situation.
Even worse example:
Omega: We have two boxes and… oh, I realized I don’t like your face. You lose.
For each decision theory there can be an Omega disliking this specific theory, and then this theory does not win.
So, does the No Free Lunch-like theorem only predict these results, or something stronger than this?
Omega: Here are two opaque boxes. One contains $1, one contains $1M. You can open one box and keep whatever is in it. I have predicted your decision and put the big money in the other box. I’ve played this game a lot and haven’t lost the million yet.
This reminds me of the non-existence of a perfect encryption algorithm, where an encryption algorithm is a bijective map S → S, where S is the set of finite strings on a given alphabet. The image of strings of length at most n cannot lie in strings of length at most n-1, so either no string gets compressed (reduced in length) or there will be some strings that will become longer after compression.
I think this can be solved in practice by heeding the assumption that a very sparse subset of all such strings will be mapped by our encryption algorithm when embedded physically. Then if we low-dimensionally parametrize hash functions of the form above, we can store the parameters for choosing a suitable hash function along with the encrypted text, and our algorithm only produces compressed strings of greater length if we try to encrypt more than some constant percentage of all possible length ⇐ n strings, with n fixed (namely, when we saturate suitable choices of parameters). If this constant is anywhere within a few orders of magnitude of 1, the algorithm is then always compressive in physical practice by finiteness of matter (we won’t ever have enough physical bits to represent that percentage of strings simultaneously).
Maybe a similar argument can be made for Omega? If Omega must be made of matter, we can always pick a decision theory given the finiteness of actual Omega’s as implemented in physics. Of course, there may be no algorithm for choosing the optimal decision theory if Omega is allowed to lie unless we can see Omega’s source code, even though a good choice exists.
I don’t understand what you’re saying here. You mean you pick a has function, and then store both the hashed value and the parameters for the hash function? But hash functions are lossy.
Why does perfection in an encryption algorithm require compression? Did you mean to say “compression algorithm”?
Some notes on compression algorithms:
If one defines a compression algorithm as a bijective map on all strings, and a decompression algorithm as an inverse of a compression algorithm, then according to this definition, all decompression algorithm are also compression algorithms.
Suppose we apply a compression algorithm to a string, and then apply the algorithm to the result, and so on and so on. Suppose we call this the orbit of the string. Every orbit will be finite or infinite. A finite orbit will eventually come back to the original string. The length of strings in an orbit cannot be strictly monotonically decreasing, and if they are constant, then the orbit must be finite. In an infinite orbit, the length of the strings will tend towards infinity. So every compression algorithm will either simply cycle between some strings, or eventually makes things bigger.
I don’t see how this is comparable. It’s not like you’re shuffling around bad decisions and good decisions into an optimal arrangement. You just try to make better decisions, and the bad decisions can go take a hike.
So, does the No Free Lunch-like theorem only predict these results, or something stronger than this?
You have it. (Roughly speaking.)
Of No Free Lunch relevance is the additional consideration that most decision theories perform poorly at tasks other than maximising expected utility according to the supplied values. That is, No Free Lunch tells us that decision theories are very bad at making bad decisions.
Problems where Omega is adversarial and punishes a specific decision theory are not very interesting. None of the standard problems where naive CDT/EDT fail are set up this way.
I don’t have an actual theorem, I’m rambling by analogy. I was thinking of something analogous to No Free Lunch in the compression field. There is no compression algorithm that is ideal for all data, but you can say “this one is better than this one” in a given domain. e.g. bzip2 compresses smaller than gzip for almost any input (though it’s more work to apply). Analogously, you shouldn’t expect any decision theory to be “perfect” for all input, just better for a sensible range of inputs.
Of course, I’ve just realised that analogy is flawed, since the compression NFL is that no algorithm can make all possible inputs smaller, whereas for decision theories they can be highly inefficient as long as they’re correct, that being the question asked. But I’m wondering if there’s something NFL-like regardless.
Here is an example:
Omega: Here are the rules, make your choice.
Decision Theory: makes the choice.
Omega: Actually, I lied. You get the opposite of what I told you, so now you have lost.
Obviously, from the set of decision theories assuming that Omega never lies, the better decision theory gets worse results in this situation.
Even worse example:
Omega: We have two boxes and… oh, I realized I don’t like your face. You lose.
For each decision theory there can be an Omega disliking this specific theory, and then this theory does not win.
So, does the No Free Lunch-like theorem only predict these results, or something stronger than this?
Omega: Here are two opaque boxes. One contains $1, one contains $1M. You can open one box and keep whatever is in it. I have predicted your decision and put the big money in the other box. I’ve played this game a lot and haven’t lost the million yet.
This reminds me of the non-existence of a perfect encryption algorithm, where an encryption algorithm is a bijective map S → S, where S is the set of finite strings on a given alphabet. The image of strings of length at most n cannot lie in strings of length at most n-1, so either no string gets compressed (reduced in length) or there will be some strings that will become longer after compression.
I think this can be solved in practice by heeding the assumption that a very sparse subset of all such strings will be mapped by our encryption algorithm when embedded physically. Then if we low-dimensionally parametrize hash functions of the form above, we can store the parameters for choosing a suitable hash function along with the encrypted text, and our algorithm only produces compressed strings of greater length if we try to encrypt more than some constant percentage of all possible length ⇐ n strings, with n fixed (namely, when we saturate suitable choices of parameters). If this constant is anywhere within a few orders of magnitude of 1, the algorithm is then always compressive in physical practice by finiteness of matter (we won’t ever have enough physical bits to represent that percentage of strings simultaneously).
Maybe a similar argument can be made for Omega? If Omega must be made of matter, we can always pick a decision theory given the finiteness of actual Omega’s as implemented in physics. Of course, there may be no algorithm for choosing the optimal decision theory if Omega is allowed to lie unless we can see Omega’s source code, even though a good choice exists.
I don’t understand what you’re saying here. You mean you pick a has function, and then store both the hashed value and the parameters for the hash function? But hash functions are lossy.
Why does perfection in an encryption algorithm require compression? Did you mean to say “compression algorithm”?
Some notes on compression algorithms: If one defines a compression algorithm as a bijective map on all strings, and a decompression algorithm as an inverse of a compression algorithm, then according to this definition, all decompression algorithm are also compression algorithms. Suppose we apply a compression algorithm to a string, and then apply the algorithm to the result, and so on and so on. Suppose we call this the orbit of the string. Every orbit will be finite or infinite. A finite orbit will eventually come back to the original string. The length of strings in an orbit cannot be strictly monotonically decreasing, and if they are constant, then the orbit must be finite. In an infinite orbit, the length of the strings will tend towards infinity. So every compression algorithm will either simply cycle between some strings, or eventually makes things bigger.
Yes, thank you, I meant compression algorithm.
That’s the precise NFL I had in mind.
I don’t see how this is comparable. It’s not like you’re shuffling around bad decisions and good decisions into an optimal arrangement. You just try to make better decisions, and the bad decisions can go take a hike.
(edit: basically, what Wedifrid said at 2:25)
Yeah, as I noted above. It’s reasonably obvious I’m not actually any sort of mathematician :-)
You have it. (Roughly speaking.)
Of No Free Lunch relevance is the additional consideration that most decision theories perform poorly at tasks other than maximising expected utility according to the supplied values. That is, No Free Lunch tells us that decision theories are very bad at making bad decisions.
Problems where Omega is adversarial and punishes a specific decision theory are not very interesting. None of the standard problems where naive CDT/EDT fail are set up this way.
I don’t have an actual theorem, I’m rambling by analogy. I was thinking of something analogous to No Free Lunch in the compression field. There is no compression algorithm that is ideal for all data, but you can say “this one is better than this one” in a given domain. e.g. bzip2 compresses smaller than gzip for almost any input (though it’s more work to apply). Analogously, you shouldn’t expect any decision theory to be “perfect” for all input, just better for a sensible range of inputs.
Of course, I’ve just realised that analogy is flawed, since the compression NFL is that no algorithm can make all possible inputs smaller, whereas for decision theories they can be highly inefficient as long as they’re correct, that being the question asked. But I’m wondering if there’s something NFL-like regardless.