jsteinhardt comments on The Power of Noise

jsteinhardt 17 Jun 2014 2:39 UTC
1 point

I doubt that Eliezer would refuse to implement a probabilistic solution on the grounds that it’s not pure enough and so no solution at all is better than a version tainted by its contact with the RNG.

Sure, I agree with this. But he still seems to think that in principle it is always possible to improve over a randomized algorithm. But doing so requires having some knowledge of the distribution over the environment, and that would break modularity.

Whether or not Eliezer himself is basing this argument on Bayesian grounds, it certainly seems to be the case that many commenters are, e.g.:

Will_Pearson:

However if it is an important problem and you think you might be able to find some regularities, the best bet would to be to do bayesian updates on the most likely positions to be ones and preferentially choose those

DonGeddis:

And yet this is no different from a deterministic algorithm. It can also query O(1) bits, and “with high probability” have a certain answer.

And some comments by Eliezer:

Eliezer_Yudkowsky:

Quantum branching is “truly random” in the sense that branching the universe both ways is an irreducible source of new indexical ignorance. But the more important point is that unless the environment really is out to get you, you might be wiser to exploit the ways that you think the environment might depart from true randomness, rather than using a noise source to throw away this knowledge.

Eliezer_Yudkowsky:

I certainly don’t say “it’s not hard work”, and the environmental probability distribution should not look like the probability distribution you have over your random numbers—it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming “worst case”. Optimizing for the worst case in environments that aren’t actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.
- Eliezer Yudkowsky 17 Jun 2014 21:00 UTC
  13 points
  Parent
  Another way of swapping around the question is to ask under what circumstances Jacob Steinhardt would refuse to use a PRNG rather than an RNG because the PRNG wasn’t random enough, and whether there’s any instance of such that doesn’t involve an intelligent adversary (or that ancient crude PRNG with bad distributional properties that everyone always cites when this topic comes up, i.e., has that happened more recently with an OK-appearing PRNG).
  
  Obviously I don’t intend to take a stance on the math-qua-math question of P vs. BPP. But to the extent that someone has to assert that an algorithm’s good BPP-related properties only work for an RNG rather than a PRNG, and there’s no intelligent adversary of any kind involved in the system, I have to question whether this could reasonably happen in real life. Having written that sentence it doesn’t feel very clear to me. What I’m trying to point at generally is that unless I have an intelligent adversary I don’t want my understanding of a piece of code to depend on whether a particular zero bit is “deterministic” or “random”. I want my understanding to say that the code has just the same effect once the zero is generated, regardless of what factors generated the zero; I want to be able to screen off the “randomness” once I’ve looked at the output of that randomness, and just ask about the effectiveness of using a zero here or a one there. Furthermore I distrust any paradigm which doesn’t look like that, and reject it as something I could really-truly believe, until the business about “randomness” has been screened off and eliminated from the analysis. Unless I’m trying to evade a cryptographic adversary who really can predict me if I choose the wrong PRNG or write down my random bits someplace that someone else can see them, so that writing down the output of an RNG and then feeding it into the computation as a deterministic constant is genuinely worse because my adversary might sneak a look at the RNG’s output if I left it written down anywhere. Or I’m trying to randomize a study and prevent accidental correlations with other people’s study, so I use an RNG just in case somebody else used a similar PRNG.
  
  But otherwise I don’t like my math treating the same bit differently depending on whether it’s “random” or “deterministic” because its actual effect on the algorithm is the same and ought to be screened off from its origins once it becomes a 1 or 0.
  
  (And there’s also a deep Bayesian issue here regarding, e.g., our ability to actually look at the contents of an envelope in the two-envelope problem and update our prior about amounts of money in envelopes to arrive at the posterior, rather than finding it intuitive to think that we picked an envelope randomly and that the randomized version of this algorithm will initially pick the envelope containing the larger amount of money half the time, which I think is a very clear illustration of the Bad Confused Thoughts into which you’re liable to be led down a garden-path, if you operate in a paradigm that doesn’t find it intuitive to look at the actual value of the random bit and ask about what we think about that actual value apart from the “random” process that supposedly generated it. But this issue the margins are too small to contain.)
  
  Is that helpful?
  - V_V 19 Jun 2014 9:43 UTC
    1 point
    Parent
    
    Obviously I don’t intend to take a stance on the math-qua-math question of P vs. BPP. But to the extent that someone has to assert that an algorithm’s good BPP-related properties only work for an RNG rather than a PRNG, and there’s no intelligent adversary of any kind involved in the system, I have to question whether this could reasonably happen in real life.
    
    If your question is “Is there a known practical case not involving an intelligent adversarial environment where the use of a cryptographic PRNG or even a true RNG rather than a non-cryptographic PRNG is warranted?” Then the answer is no.
    In fact, this is the reason why it is conjectured that P = BPP.
    
    However, given the rest of your comment, it seems that you are referring to how we understand the theoretical properties of algorithms:
    
    What I’m trying to point at generally is that unless I have an intelligent adversary I don’t want my understanding of a piece of code to depend on whether a particular zero bit is “deterministic” or “random”. I want my understanding to say that the code has just the same effect once the zero is generated, regardless of what factors generated the zero; I want to be able to screen off the “randomness” once I’ve looked at the output of that randomness, and just ask about the effectiveness of using a zero here or a one there. Furthermore I distrust any paradigm which doesn’t look like that, and reject it as something I could really-truly believe, until the business about “randomness” has been screened off and eliminated from the analysis.
    
    If we are talking about understanding the theoretical properties of many useful randomized algorithms, I’d say that we can’t “screen off” randomness.
    Even if the algorithm is implemented using a PRNG with a constant seed, and is thus fully deterministic at the level of what actually runs on the machine, when we reason about its theoretical properties, whether it is a formal analysis or a pre-formal intuitive analysis, we need to abstract away the PRNG and assume that the algorithm has access to true randomness.
    
    As it was already pointed out, if we were perfectly rational Bayesian agents, then, we would always be able to include the PRNG into our analysis:
    For instance, given a machine learning problem, a Bayesian agent would model it as a subjective probability distribution, and it may conclude that for that particular distribution the optimal algorithm is an implementation of Random Forest algorithm with the Mersenne Twister PRNG initialized with seed 42.
    For a slightly different subjective distribution, the optimal algorithm may be an implementation of a Neural Network trained with Error Backpropagation with weights initialized by a Linear Congruentlial PRNG with seed 12345.
    For another slightly different subjective distribution, the optimal algorithm may be some entirely different deterministic algorithm.
    
    In practice, we can’t reason this way. Therefore we assume true randomness in order to say meaningful things about many practically useful algorithms.
  - redlizard 18 Jun 2014 8:56 UTC
    1 point
    Parent
    A more involved post about those Bad Confused Thoughts and the deep Bayesian issue underlying it would be really interesting, when and if you ever have time for it.
- Lumifer 17 Jun 2014 2:51 UTC
  3 points
  Parent
  
  in principle it is always possible to improve over a randomized algorithm
  
  in principle are the key words.
  
  As the old saying goes, in theory there is no difference between theory and practice but in practice there is.
  
  requires having some knowledge of the distribution over the environment, and that would break modularity.
  
  You are using the word “modularity” in a sense weird to me. From my perspective, in the software context “modularity” refers to interactions of pieces of software, not inputs or distributions over the environment.
  - jsteinhardt 17 Jun 2014 18:15 UTC
    3 points
    Parent
    Based on the discusssions with you and trist, I updated the original text of that section substantially. Let me know if it’s clearer now what I mean.
    
    Also, thanks for the feedback so far!