paulfchristiano comments on Response to “What does the universal prior actually look like?”

paulfchristiano 21 May 2021 2:24 UTC
LW: 3 AF: 2
AF
In this story, I’m imagining that hypotheses like “simulate simple physics, start reading from simple location” lose, but similar hypotheses like “simulate simple physics, start reading from simple location after a long delay” (or after seeing pattern X, or whatever) could be among the output channels that we consider manipulating. Those would also eventually get falsified (if we wanted to deliberately make bad predictions in order to influence the basement world where someone is thinking about the universal prior) but not until a critical prediction that we wanted to influence.
- Signer 21 May 2021 23:27 UTC
  1 point
  Parent
  And I guess that’s where decision-theoretic questions arise—if basement inductors are willing to wait for enough frames, then we can’t do anything, so we won’t. Because we wouldn’t have enough simplicity to fake observations indefinitely, right? Otherwise we are intended model.
  - paulfchristiano 21 May 2021 23:44 UTC
    3 points
    Parent
    If the basement people ever need to make some potentially-catastrophic decisions (for which a misprediction is catastrophic), then the manipulators can wait until those decisions to break the predictor. Waiting is probably cheaper for the manipulators than gathering more data is for us (or at best they are proportionally expensive, so doesn’t change the calculus at all).
    If you are in something like the low-stakes setting, then there’s no opportunity for a manipulator to do too much damage—every time they do some harm they lose a bit of probability, and so there’s a reasonable bound on the total harm they can do.