cousin_it comments on Epiphenomenal Oracles Ignore Holes in the Box

cousin_it 1 Feb 2018 10:34 UTC
4 points
I don’t completely understand the difference between your proposal and Stuart’s counterfactual oracles, can you explain?
- SilentCal 1 Feb 2018 16:15 UTC
  4 points
  Parent
  The practical difference is that the counterfactual oracle design doesn’t address side-channel attacks, only unsafe answers.
  Internally, the counterfactual oracle is implemented via the utility function: it wants to give an answer that would be accurate if it were unread. This puts no constraints on how it gets that answer, and I don’t see any way extend the technique to cover the reasoning process.
  My proposal is implemented via a constraint on the AI’s model of the world. Whether this is actually possible depends on the details of the AI; anything of a “try random stuff, repeat whatever gets results” nature would make it impossible, but an explicitly Bayesian thing like the AIXI family would be amenable. I think this is why Stuart works with the utility function lately, but I don’t think you can get a safe Oracle this way without either creating an agent-grade safe utility function or constructing a superintelligence-proof traditional box.