I’m an expert in a neighboring field: numerical optimization. I’ve seen lots of evidence for Jaynes’s impression that for any algorithm that uses randomness, one can find a deterministic technique (which takes thought) that accomplishes that goal better. (For example, comparing genetic algorithms, simulated annealing, and tabu search, the last has an deterministic memory mechanism that gives it the ability to do both diversification and intensification, with much more control than either of the other two.) Random methods are employed frequently because to do otherwise would require thought.
As for the debate, to me it looks like it was over terminology. To illustrate, let me label three different cases: the ‘benign’ case, where the environment is assumed to be dumb (i.e. maxent priors are reasonable), the ‘adversary’ case, where the environment is assumed to be an agent that knows your algorithm but not your private source of randomness, and the ‘omega’ case, where the environment is assumed to be an agent that knows your algorithm and your private source of randomness.*
Eliezer thinks the phrase ‘worst case analysis’ should refer to the ‘omega’ case. Scott thinks that the ‘adversary’ case is worth doing theoretical analysis on. The first is a preference that I agree with,** the second seems reasonable. I think Silas did a good job of summarizing the power that randomness does have:
Which is why I summarize Eliezer_Yudkowsky’s position as: “Randomness is like poison. Yes, it can benefit you, but only if you use it on others.”
*There’s also a subtlety about solving the problem ‘with high probability.’ For a deterministic algorithm to have a chance at doing that, it needs to be the benign case- otherwise, the adversary decides that you fail if you left them any opening. For a randomized algorithm, the benign and adversary cases are equivalent.
**One of the things that Scott linked—when Monte Carlo goes wrong—is something that shows up a lot, and there’s a whole industry built around generating random numbers. For any randomized algorithm, the real worst case is that you’ve got a PRNG that hates you, unless you’ve paid to get your bits from a source that’s genuinely random, and if omega was the case that came to mind when people said ‘worst case,’ rather than adversary, this would have been more obvious. (vN got it, unsurprisingly, but it’s not clear that the median CS researcher did until they noticed the explosions.)
I suppose that there are really 6 different complexities. Let x be the random numbers you pick and let y be the input you get. Let f(x,y) be the time your algorithm takes on that instance. We can maximise or take the expected value for each of x and y, in either order.
E_x E_y f(x,y) (Average case complexity; Eliezer’s favourite.)
max_y E_x f(x,y) (Worst case complexity; Scott’s favourite.)
E_x max_y f(x,y) (facing an adversary who knows what random numbers you are going to get.)
max_x E_y f(x,y) (you get an input at random, your random number generator tries to give you bad numbers but doesn’t know what case you’ll get.)
E_y max_x f(x,y) (you get an input at random, your random number generator gives you the worst numbers for that input.)
max_x max_y f(x,y) (‘Omega’ case complexity; your environment and your random number generator conspire to kill you.)
There’s not one best way to put a distribution on the possible inputs, so computer scientists who don’t want to do this can only look at 2, 3, and 6. Then problems that can be solved in poly time according to 2 are the ones in BPP, and those that can be solved in poly time according to 6 are the ones in P (you don’t use your RNG if it’s trying to kill you). I don’t know if the corresponding class for 3 has a name, but it doesn’t seem very interesting.
EDIT: You also don’t want to use your RNG if you’re being judged by 4 or 5, so these both correspond to the complexity class where you judge by average time and you don’t have a RNG. Then 1 corresponds to the complexity class where you’re judged by average time and you do have an RNG. These complexity classes are known to be the same, since as Scott says
given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.
Random methods are employed frequently because to do otherwise would require thought.
That means that randomness has power, it spares you the cost of thinking. Depending on the amount of thinking needed this can be quite substantial a value.
It also offers a safeguard against errors in your thinking. If your thinking is wrong you might choose an algorithm that is bad given the evironment. Compare that to the probability of the random algo being matched by the environment.
Eliezer thinks the phrase ‘worst case analysis’ should refer to the ‘omega’ case.
“Worst case analysis” is a standard term of art in computer science, that shows up as early as second-semester programming, and Eliezer will be better understood if he uses the standard term in the standard way.
A computer scientist would not describe the “omega” case as random—if the input is correlated with the random number source in a way that is detectable by the algorithm, they’re by definition not random.
“Worst case analysis” is a standard term of art in computer science, that shows up as early as second-semester programming, and Eliezer will be better understood if he uses the standard term in the standard way.
Actually, in the context of randomized algorithms, I’ve always seen the term “worst case running time” refer to Oscar’s case 6, and “worst-case expected running time”—often somewhat misleadingly simplified to “expected running time”—refer to Oscar’s case 2.
A computer scientist would not describe the “omega” case as random—if the input is correlated with the random number source in a way that is detectable by the algorithm, they’re by definition not random.
A system that reliably behaves like the omega case is clearly not random. However, a random system such as case 2 may still occasionally behave like omega, with probability epsilon, and it is not at all unreasonable or uncommon to require your algorithm to work efficiently even in those rare cases. Thus, one might optimize a random system by modelling it as an omega system, and demanding that it works well enough even in that context.
I did not propose that worst case be interpreted as Omega or that it be given any nonstandard referent. I did suggest that “worst case” to describe the Adversary scenario is deceptive to readers, and we should ReplaceTheSymbolWithTheSubstance via a more descriptive phrase like “adversarial superintelligence that knows everything except the bits designated random”. This is what the phrase standardly means in computer science, but calling this “worst case analysis” seems to me deceptive, especially if we’re trying to conduct a meta-ish debate about the benefits of randomization, rather than talking about some particular algorithm.
Eliezer will be better understood if he uses the standard term in the standard way.
Agreed.
A computer scientist would not describe the “omega” case as random—if the input is correlated with the random number source in a way that is detectable by the algorithm, they’re by definition not random.
Right. But I want to repeat the objection here that we often use pseudorandomness instead of actual randomness, and then the real worst case is that we’ve gotten a cursed seed. Somewhat less practically, in situations where a real adversary may have access to our hardware, we may have to assume that they can read (or write to!) our RNG.
Somewhat less practically, in situations where a real adversary may have access to our hardware, we may have to assume that they can read (or write to!) our RNG.
I think this is a practical scenario in cryptography where your threat model is state-level actors.
They may only be able to access your hardware in limited ways. For example, if a hardware “RNG” actually outputs 1,2,3,… encrypted with some key known only to the NSA, that’s essentially totally undetectable. But if instead they have the hardware send out extra information over the internet, sooner or later someone will notice and the game will be up.
They don’t need to sync for it to be a serious weakness in a cryptosystem. If a system using Khoth’s PRNG sends out a billion encrypted messages in its lifetime, then an attacker with the PRNG key needs less than 2^30 tries to decrypt a message sent at an unknown point in that sequence—a large number, but more than manageable when you consider that a PRNG with a period of 2^80 would be considered weak in the crypto world.
If your adversary can read or write bits in your hardware, then what is the purpose of using cryptography?
Side channel attacks on hardware are not rare. For example, an adversary might have a way of measuring the power consumption of your CPU as it does RNG calculations. This is not quite the ability to “read or write bits in… hardware”, but it is a viable attack to gain information about your, ahem, random numbers.
Random methods are employed frequently because to do otherwise would require thought.
Sometimes. And sometimes you do not (or can not) understand the environment well enough to commit to a deterministic optimizer because of lack of data and not of thought.
Random methods are employed frequently because to do otherwise would require thought.
There is another more charitable possible justification, sometimes you’re using such an approach to solve a problem you genuinely don’t know how to solve yourself and clever tricks you use may unwittingly shut off valuable areas of the search space.
I’m an expert in a neighboring field: numerical optimization. I’ve seen lots of evidence for Jaynes’s impression that for any algorithm that uses randomness, one can find a deterministic technique (which takes thought) that accomplishes that goal better. (For example, comparing genetic algorithms, simulated annealing, and tabu search, the last has an deterministic memory mechanism that gives it the ability to do both diversification and intensification, with much more control than either of the other two.) Random methods are employed frequently because to do otherwise would require thought.
As for the debate, to me it looks like it was over terminology. To illustrate, let me label three different cases: the ‘benign’ case, where the environment is assumed to be dumb (i.e. maxent priors are reasonable), the ‘adversary’ case, where the environment is assumed to be an agent that knows your algorithm but not your private source of randomness, and the ‘omega’ case, where the environment is assumed to be an agent that knows your algorithm and your private source of randomness.*
Eliezer thinks the phrase ‘worst case analysis’ should refer to the ‘omega’ case. Scott thinks that the ‘adversary’ case is worth doing theoretical analysis on. The first is a preference that I agree with,** the second seems reasonable. I think Silas did a good job of summarizing the power that randomness does have:
*There’s also a subtlety about solving the problem ‘with high probability.’ For a deterministic algorithm to have a chance at doing that, it needs to be the benign case- otherwise, the adversary decides that you fail if you left them any opening. For a randomized algorithm, the benign and adversary cases are equivalent.
**One of the things that Scott linked—when Monte Carlo goes wrong—is something that shows up a lot, and there’s a whole industry built around generating random numbers. For any randomized algorithm, the real worst case is that you’ve got a PRNG that hates you, unless you’ve paid to get your bits from a source that’s genuinely random, and if omega was the case that came to mind when people said ‘worst case,’ rather than adversary, this would have been more obvious. (vN got it, unsurprisingly, but it’s not clear that the median CS researcher did until they noticed the explosions.)
I suppose that there are really 6 different complexities. Let x be the random numbers you pick and let y be the input you get. Let f(x,y) be the time your algorithm takes on that instance. We can maximise or take the expected value for each of x and y, in either order.
E_x E_y f(x,y) (Average case complexity; Eliezer’s favourite.)
max_y E_x f(x,y) (Worst case complexity; Scott’s favourite.)
E_x max_y f(x,y) (facing an adversary who knows what random numbers you are going to get.)
max_x E_y f(x,y) (you get an input at random, your random number generator tries to give you bad numbers but doesn’t know what case you’ll get.)
E_y max_x f(x,y) (you get an input at random, your random number generator gives you the worst numbers for that input.)
max_x max_y f(x,y) (‘Omega’ case complexity; your environment and your random number generator conspire to kill you.)
There’s not one best way to put a distribution on the possible inputs, so computer scientists who don’t want to do this can only look at 2, 3, and 6. Then problems that can be solved in poly time according to 2 are the ones in BPP, and those that can be solved in poly time according to 6 are the ones in P (you don’t use your RNG if it’s trying to kill you). I don’t know if the corresponding class for 3 has a name, but it doesn’t seem very interesting.
EDIT: You also don’t want to use your RNG if you’re being judged by 4 or 5, so these both correspond to the complexity class where you judge by average time and you don’t have a RNG. Then 1 corresponds to the complexity class where you’re judged by average time and you do have an RNG. These complexity classes are known to be the same, since as Scott says
That means that randomness has power, it spares you the cost of thinking. Depending on the amount of thinking needed this can be quite substantial a value.
I’d agree with that part.
It also offers a safeguard against errors in your thinking. If your thinking is wrong you might choose an algorithm that is bad given the evironment. Compare that to the probability of the random algo being matched by the environment.
Indeed. The grandparent is not entirely a condemnation.
“Worst case analysis” is a standard term of art in computer science, that shows up as early as second-semester programming, and Eliezer will be better understood if he uses the standard term in the standard way.
A computer scientist would not describe the “omega” case as random—if the input is correlated with the random number source in a way that is detectable by the algorithm, they’re by definition not random.
Actually, in the context of randomized algorithms, I’ve always seen the term “worst case running time” refer to Oscar’s case 6, and “worst-case expected running time”—often somewhat misleadingly simplified to “expected running time”—refer to Oscar’s case 2.
A system that reliably behaves like the omega case is clearly not random. However, a random system such as case 2 may still occasionally behave like omega, with probability epsilon, and it is not at all unreasonable or uncommon to require your algorithm to work efficiently even in those rare cases. Thus, one might optimize a random system by modelling it as an omega system, and demanding that it works well enough even in that context.
I did not propose that worst case be interpreted as Omega or that it be given any nonstandard referent. I did suggest that “worst case” to describe the Adversary scenario is deceptive to readers, and we should ReplaceTheSymbolWithTheSubstance via a more descriptive phrase like “adversarial superintelligence that knows everything except the bits designated random”. This is what the phrase standardly means in computer science, but calling this “worst case analysis” seems to me deceptive, especially if we’re trying to conduct a meta-ish debate about the benefits of randomization, rather than talking about some particular algorithm.
Agreed.
Right. But I want to repeat the objection here that we often use pseudorandomness instead of actual randomness, and then the real worst case is that we’ve gotten a cursed seed. Somewhat less practically, in situations where a real adversary may have access to our hardware, we may have to assume that they can read (or write to!) our RNG.
I think this is a practical scenario in cryptography where your threat model is state-level actors.
If your adversary can read or write bits in your hardware, then what is the purpose of using cryptography?
They may only be able to access your hardware in limited ways. For example, if a hardware “RNG” actually outputs 1,2,3,… encrypted with some key known only to the NSA, that’s essentially totally undetectable. But if instead they have the hardware send out extra information over the internet, sooner or later someone will notice and the game will be up.
How does the NSA synchs with your “RNG” is no information is exchanged?
But anyway, if you reasonably believe that your RNG may have been compromised, then you just don’t use it.
They don’t need to sync for it to be a serious weakness in a cryptosystem. If a system using Khoth’s PRNG sends out a billion encrypted messages in its lifetime, then an attacker with the PRNG key needs less than 2^30 tries to decrypt a message sent at an unknown point in that sequence—a large number, but more than manageable when you consider that a PRNG with a period of 2^80 would be considered weak in the crypto world.
Agreed.
Side channel attacks on hardware are not rare. For example, an adversary might have a way of measuring the power consumption of your CPU as it does RNG calculations. This is not quite the ability to “read or write bits in… hardware”, but it is a viable attack to gain information about your, ahem, random numbers.
Sure, but at this point they can also gain information on your keys or the data you wish to encrypt.
Not necessarily. Think wider, not only PCs use encrypted communications. Consider a router, for example, or a remote sensor.
Still, if they can compromise the RNG state in the router/sensor/whatever, they could probably compromise its CPU and/or RAM.
That’s not self-evident to me. Passively observing power consumption is much easier than, say, getting inside a SOC in tamper-resistant packaging.
Sometimes you think you’re playing Tetris, when you’re actually playing Bastet.
Sometimes. And sometimes you do not (or can not) understand the environment well enough to commit to a deterministic optimizer because of lack of data and not of thought.
There is another more charitable possible justification, sometimes you’re using such an approach to solve a problem you genuinely don’t know how to solve yourself and clever tricks you use may unwittingly shut off valuable areas of the search space.