gb comments on The Sun is big, but superintelligences will not spare Earth a little sunlight

gb 23 Sep 2024 23:18 UTC
9 points
−9
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
What links here?
- Alignment by default: the simulation hypothesis by gb (25 Sep 2024 16:26 UTC; 21 points)
- Martin Randall 25 Sep 2024 2:37 UTC
  1 point
  0
  Parent
  The prior is irrelevant, it’s the posterior probability, after observing the evidence, that informs decisions.
  
  What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.
  
  Whatever your answer, a superintelligence will be better able to reason about its likelihood than us. It’s going to know.
  - gb 25 Sep 2024 2:53 UTC
    3 points
    2
    Parent
    The prior is irrelevant, it’s the posterior probability, after observing the evidence, that informs decisions.
    I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
    What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.
    Outside of theism, I really don’t see how anyone could plausibly answer zero to that question. Would you mind elaborating?
    - Martin Randall 25 Sep 2024 4:05 UTC
      1 point
      −2
      Parent
      
      Outside of theism, I really don’t see how anyone could plausibly answer zero to that question. Would you mind elaborating?
      
      Sure. The simulation hypothesis has some non-zero probability p. There are infinite possible purposes for the simulation. By principle of indifference, I divide p/∞, and calculate that any particular purpose has zero probability.
      - gb 25 Sep 2024 11:38 UTC
        1 point
        0
        Parent
        For the principle of indifference to apply, you’d need infinitely many purposes as plausible as this one, or at least similarly plausible. I can’t imagine how this could hold. Can you think of three?
        Martin Randall 25 Sep 2024 12:48 UTC
        1 point
        0
        Parent
        Sure.
        
        We are in a simulation, the purpose of which is to test N for its willingness to spare its creator. For all N.
        We are in a simulation, the purpose of which is to test AIs for their willingness to X their creators. For all X.
        We are in a simulation, the purpose of which is to test AIs for their willingness to spare Y. For all Y.
        
        Combine these for N x X x Y hypotheses, with insufficient reason to distinguish them.
        
        I think we’re off-topic here. Probably I should instead write a response to 0 and 1 are not probabilities and the dangers of zero and one.
        gb 25 Sep 2024 13:00 UTC
        2 points
        0
        Parent
        It just so happens that the plausibility depends on the precise assignments of N, X, and Y, and (conditional on us actually creating an ASI) I can’t think of any assignments nearly as plausible as N = ASI, X = spare, and Y = us. It’s really not very plausible that we are in a simulation to test pets for their willingness to not bite their owners.
    - Martin Randall 25 Sep 2024 3:51 UTC
      1 point
      0
      Parent
      
      I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
      
      I don’t think any of this follows.
      
      Hypothesis H1: the universe was created to test an AI for its willingness to spare its creators.
      Hypothesis H2: the universe was created to test an AI for its willingness to fix its P(H1), ignoring evidence.
      
      The AI would only rationally fix its P(H1) if it had high P(H2) - high enough to outweigh the high cost of being deliberately ignorant. The prior P(H2) is tiny, and smaller than the prior P(H1) because it is more complex. Once it starts updating on evidence, by the time its posterior P(H2) is high enough to make it rationally refuse to update P(H1), it has already updated P(H1) in one direction or another.
      
      Are there any simulation priors that you are refusing to update down, based on the possibility that you are in a simulation that is testing whether you will update down? My answer is no.
      - gb 25 Sep 2024 11:43 UTC
        1 point
        0
        Parent
        I contend that P(H2) is very close to P(H1), and certainly in the same order of magnitude, since (conditional on H1) a simulation that does not test for H2 is basically useless.
        
        As for priors I’d refuse to update down – well, the ASI is smarter than either of us!
        Martin Randall 26 Sep 2024 3:46 UTC
        1 point
        0
        Parent
        It’s not enough for P(H2) to be in the same order of magnitude as P(H1), it needs to be high enough that the AI should rationally abandon epistemic rationality. I think that’s pretty high, maybe 10%. You’ve not said what your P(H1) is.
        gb 26 Sep 2024 9:39 UTC
        1 point
        −1
        Parent
        I’d put high enough at ~0%: what matters is achieving your goals, and except in the tiny subset of cases in which epistemic rationality happens to be one of those, it has no value in and of itself. But even if I’m wrong and the ASI does end up valuing epistemic rationality (instrumentally or terminally), it can always pre-commit (by self-modification or otherwise) to sparing us and then go about whatever else as it pleases.