Rohin Shah comments on Standard ML Oracles vs Counterfactual ones

Rohin Shah 14 Oct 2018 18:59 UTC
LW: 2 AF: 2
AF
It’s not clear to me why the BFO would converge to a fixed point of $μ$ . If we’ve solved the problem of embedded agency and the AI system knows that $y_{t}$ can depend on its prediction $z_{t}$ , then it would tend to find a fixed point, but it could also do the sort of counterfactual reasoning you say it can’t do. If we haven’t solved embedded agency, then it seems like the function that best explains the data is to posit the existence of some other classifier $h$ that works the same way that the AI did in past timesteps, and that $y_{t} = μ (h (x_{t})) + v (h (x_{t}))$ . Intuitively, this is saying that the past data is explained by a hypothetical other classifier that worked the same way as the AI used to, and now the AI thinks one level higher than that. This probably does converge to a fixed point eventually, but at any given timestep the best hypothesis would be something that is a finite number of applications of $μ$ and $v$ .
The BFO can generally cope with humans observing $z_{t} = f (y_{t})$
Should this be $z_{t} = f (x_{t})$ ?
- Stuart_Armstrong 14 Oct 2018 19:25 UTC
  LW: 2 AF: 1
  AF Parent
  Thanks, corrected that.