If you want to be so hardcore Bayesian that you even look down on randomized algorithms in software
Do such people exist? I don’t know anyone who fits such a description.
I take this to be evidence that the “never use randomness” view would call for software that most engineers would consider poorly-designed, and as such is an unproductive view from the perspective of designing good software.
Well, of course. But again, has anyone been arguing against that? The original dispute was talking about abstract optimality and whether particular solutions exist. No one suggested that in real-world practice at the current level of technology rand() Call Considered Harmful.
Well, let’s put it this way. If you can implement a non-randomized solution that works within the specified constraints (time, etc.), it is preferable to a probabilistic one. The real question is what happens when you can’t. I doubt that Eliezer would refuse to implement a probabilistic solution on the grounds that it’s not pure enough and so no solution at all is better than a version tainted by its contact with the RNG.
An exception might be when you need to be absolutely sure what the software does or does not do and any randomness introduces an unacceptable risk. But such a case is very rare.
I doubt that Eliezer would refuse to implement a probabilistic solution on the grounds that it’s not pure enough and so no solution at all is better than a version tainted by its contact with the RNG.
Sure, I agree with this. But he still seems to think that in principle it is always possible to improve over a randomized algorithm. But doing so requires having some knowledge of the distribution over the environment, and that would break modularity.
Whether or not Eliezer himself is basing this argument on Bayesian grounds, it certainly seems to be the case that many commenters are, e.g.:
However if it is an important problem and you think you might be able to find some regularities, the best bet would to be to do bayesian updates on the most likely positions to be ones and preferentially choose those
Quantum branching is “truly random” in the sense that branching the universe both ways is an irreducible source of new indexical ignorance. But the more important point is that unless the environment really is out to get you, you might be wiser to exploit the ways that you think the environment might depart from true randomness, rather than using a noise source to throw away this knowledge.
I certainly don’t say “it’s not hard work”, and the environmental probability distribution should not look like the probability distribution you have over your random numbers—it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming “worst case”. Optimizing for the worst case in environments that aren’t actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.
Another way of swapping around the question is to ask under what circumstances Jacob Steinhardt would refuse to use a PRNG rather than an RNG because the PRNG wasn’t random enough, and whether there’s any instance of such that doesn’t involve an intelligent adversary (or that ancient crude PRNG with bad distributional properties that everyone always cites when this topic comes up, i.e., has that happened more recently with an OK-appearing PRNG).
Obviously I don’t intend to take a stance on the math-qua-math question of P vs. BPP. But to the extent that someone has to assert that an algorithm’s good BPP-related properties only work for an RNG rather than a PRNG, and there’s no intelligent adversary of any kind involved in the system, I have to question whether this could reasonably happen in real life. Having written that sentence it doesn’t feel very clear to me. What I’m trying to point at generally is that unless I have an intelligent adversary I don’t want my understanding of a piece of code to depend on whether a particular zero bit is “deterministic” or “random”. I want my understanding to say that the code has just the same effect once the zero is generated, regardless of what factors generated the zero; I want to be able to screen off the “randomness” once I’ve looked at the output of that randomness, and just ask about the effectiveness of using a zero here or a one there. Furthermore I distrust any paradigm which doesn’t look like that, and reject it as something I could really-truly believe, until the business about “randomness” has been screened off and eliminated from the analysis. Unless I’m trying to evade a cryptographic adversary who really can predict me if I choose the wrong PRNG or write down my random bits someplace that someone else can see them, so that writing down the output of an RNG and then feeding it into the computation as a deterministic constant is genuinely worse because my adversary might sneak a look at the RNG’s output if I left it written down anywhere. Or I’m trying to randomize a study and prevent accidental correlations with other people’s study, so I use an RNG just in case somebody else used a similar PRNG.
But otherwise I don’t like my math treating the same bit differently depending on whether it’s “random” or “deterministic” because its actual effect on the algorithm is the same and ought to be screened off from its origins once it becomes a 1 or 0.
(And there’s also a deep Bayesian issue here regarding, e.g., our ability to actually look at the contents of an envelope in the two-envelope problem and update our prior about amounts of money in envelopes to arrive at the posterior, rather than finding it intuitive to think that we picked an envelope randomly and that the randomized version of this algorithm will initially pick the envelope containing the larger amount of money half the time, which I think is a very clear illustration of the Bad Confused Thoughts into which you’re liable to be led down a garden-path, if you operate in a paradigm that doesn’t find it intuitive to look at the actual value of the random bit and ask about what we think about that actual value apart from the “random” process that supposedly generated it. But this issue the margins are too small to contain.)
Obviously I don’t intend to take a stance on the math-qua-math question of P vs. BPP. But to the extent that someone has to assert that an algorithm’s good BPP-related properties only work for an RNG rather than a PRNG, and there’s no intelligent adversary of any kind involved in the system, I have to question whether this could reasonably happen in real life.
If your question is “Is there a known practical case not involving an intelligent adversarial environment where the use of a cryptographic PRNG or even a true RNG rather than a non-cryptographic PRNG is warranted?” Then the answer is no. In fact, this is the reason why it is conjectured that P = BPP.
However, given the rest of your comment, it seems that you are referring to how we understand the theoretical properties of algorithms:
What I’m trying to point at generally is that unless I have an intelligent adversary I don’t want my understanding of a piece of code to depend on whether a particular zero bit is “deterministic” or “random”. I want my understanding to say that the code has just the same effect once the zero is generated, regardless of what factors generated the zero; I want to be able to screen off the “randomness” once I’ve looked at the output of that randomness, and just ask about the effectiveness of using a zero here or a one there. Furthermore I distrust any paradigm which doesn’t look like that, and reject it as something I could really-truly believe, until the business about “randomness” has been screened off and eliminated from the analysis.
If we are talking about understanding the theoretical properties of many useful randomized algorithms, I’d say that we can’t “screen off” randomness. Even if the algorithm is implemented using a PRNG with a constant seed, and is thus fully deterministic at the level of what actually runs on the machine, when we reason about its theoretical properties, whether it is a formal analysis or a pre-formal intuitive analysis, we need to abstract away the PRNG and assume that the algorithm has access to true randomness.
As it was already pointed out, if we were perfectly rational Bayesian agents, then, we would always be able to include the PRNG into our analysis: For instance, given a machine learning problem, a Bayesian agent would model it as a subjective probability distribution, and it may conclude that for that particular distribution the optimal algorithm is an implementation of Random Forest algorithm with the Mersenne Twister PRNG initialized with seed 42. For a slightly different subjective distribution, the optimal algorithm may be an implementation of a Neural Network trained with Error Backpropagation with weights initialized by a Linear Congruentlial PRNG with seed 12345. For another slightly different subjective distribution, the optimal algorithm may be some entirely different deterministic algorithm.
In practice, we can’t reason this way. Therefore we assume true randomness in order to say meaningful things about many practically useful algorithms.
A more involved post about those Bad Confused Thoughts and the deep Bayesian issue underlying it would be really interesting, when and if you ever have time for it.
in principle it is always possible to improve over a randomized algorithm
in principle are the key words.
As the old saying goes, in theory there is no difference between theory and practice but in practice there is.
requires having some knowledge of the distribution over the environment, and that would break modularity.
You are using the word “modularity” in a sense weird to me. From my perspective, in the software context “modularity” refers to interactions of pieces of software, not inputs or distributions over the environment.
Do such people exist? I don’t know anyone who fits such a description.
Well, of course. But again, has anyone been arguing against that? The original dispute was talking about abstract optimality and whether particular solutions exist. No one suggested that in real-world practice at the current level of technology rand() Call Considered Harmful.
What about Eliezer?
Well, let’s put it this way. If you can implement a non-randomized solution that works within the specified constraints (time, etc.), it is preferable to a probabilistic one. The real question is what happens when you can’t. I doubt that Eliezer would refuse to implement a probabilistic solution on the grounds that it’s not pure enough and so no solution at all is better than a version tainted by its contact with the RNG.
An exception might be when you need to be absolutely sure what the software does or does not do and any randomness introduces an unacceptable risk. But such a case is very rare.
Sure, I agree with this. But he still seems to think that in principle it is always possible to improve over a randomized algorithm. But doing so requires having some knowledge of the distribution over the environment, and that would break modularity.
Whether or not Eliezer himself is basing this argument on Bayesian grounds, it certainly seems to be the case that many commenters are, e.g.:
Will_Pearson:
DonGeddis:
And some comments by Eliezer:
Eliezer_Yudkowsky:
Eliezer_Yudkowsky:
Another way of swapping around the question is to ask under what circumstances Jacob Steinhardt would refuse to use a PRNG rather than an RNG because the PRNG wasn’t random enough, and whether there’s any instance of such that doesn’t involve an intelligent adversary (or that ancient crude PRNG with bad distributional properties that everyone always cites when this topic comes up, i.e., has that happened more recently with an OK-appearing PRNG).
Obviously I don’t intend to take a stance on the math-qua-math question of P vs. BPP. But to the extent that someone has to assert that an algorithm’s good BPP-related properties only work for an RNG rather than a PRNG, and there’s no intelligent adversary of any kind involved in the system, I have to question whether this could reasonably happen in real life. Having written that sentence it doesn’t feel very clear to me. What I’m trying to point at generally is that unless I have an intelligent adversary I don’t want my understanding of a piece of code to depend on whether a particular zero bit is “deterministic” or “random”. I want my understanding to say that the code has just the same effect once the zero is generated, regardless of what factors generated the zero; I want to be able to screen off the “randomness” once I’ve looked at the output of that randomness, and just ask about the effectiveness of using a zero here or a one there. Furthermore I distrust any paradigm which doesn’t look like that, and reject it as something I could really-truly believe, until the business about “randomness” has been screened off and eliminated from the analysis. Unless I’m trying to evade a cryptographic adversary who really can predict me if I choose the wrong PRNG or write down my random bits someplace that someone else can see them, so that writing down the output of an RNG and then feeding it into the computation as a deterministic constant is genuinely worse because my adversary might sneak a look at the RNG’s output if I left it written down anywhere. Or I’m trying to randomize a study and prevent accidental correlations with other people’s study, so I use an RNG just in case somebody else used a similar PRNG.
But otherwise I don’t like my math treating the same bit differently depending on whether it’s “random” or “deterministic” because its actual effect on the algorithm is the same and ought to be screened off from its origins once it becomes a 1 or 0.
(And there’s also a deep Bayesian issue here regarding, e.g., our ability to actually look at the contents of an envelope in the two-envelope problem and update our prior about amounts of money in envelopes to arrive at the posterior, rather than finding it intuitive to think that we picked an envelope randomly and that the randomized version of this algorithm will initially pick the envelope containing the larger amount of money half the time, which I think is a very clear illustration of the Bad Confused Thoughts into which you’re liable to be led down a garden-path, if you operate in a paradigm that doesn’t find it intuitive to look at the actual value of the random bit and ask about what we think about that actual value apart from the “random” process that supposedly generated it. But this issue the margins are too small to contain.)
Is that helpful?
If your question is “Is there a known practical case not involving an intelligent adversarial environment where the use of a cryptographic PRNG or even a true RNG rather than a non-cryptographic PRNG is warranted?” Then the answer is no.
In fact, this is the reason why it is conjectured that P = BPP.
However, given the rest of your comment, it seems that you are referring to how we understand the theoretical properties of algorithms:
If we are talking about understanding the theoretical properties of many useful randomized algorithms, I’d say that we can’t “screen off” randomness.
Even if the algorithm is implemented using a PRNG with a constant seed, and is thus fully deterministic at the level of what actually runs on the machine, when we reason about its theoretical properties, whether it is a formal analysis or a pre-formal intuitive analysis, we need to abstract away the PRNG and assume that the algorithm has access to true randomness.
As it was already pointed out, if we were perfectly rational Bayesian agents, then, we would always be able to include the PRNG into our analysis:
For instance, given a machine learning problem, a Bayesian agent would model it as a subjective probability distribution, and it may conclude that for that particular distribution the optimal algorithm is an implementation of Random Forest algorithm with the Mersenne Twister PRNG initialized with seed 42.
For a slightly different subjective distribution, the optimal algorithm may be an implementation of a Neural Network trained with Error Backpropagation with weights initialized by a Linear Congruentlial PRNG with seed 12345.
For another slightly different subjective distribution, the optimal algorithm may be some entirely different deterministic algorithm.
In practice, we can’t reason this way. Therefore we assume true randomness in order to say meaningful things about many practically useful algorithms.
A more involved post about those Bad Confused Thoughts and the deep Bayesian issue underlying it would be really interesting, when and if you ever have time for it.
in principle are the key words.
As the old saying goes, in theory there is no difference between theory and practice but in practice there is.
You are using the word “modularity” in a sense weird to me. From my perspective, in the software context “modularity” refers to interactions of pieces of software, not inputs or distributions over the environment.
Based on the discusssions with you and trist, I updated the original text of that section substantially. Let me know if it’s clearer now what I mean.
Also, thanks for the feedback so far!