It’s not clear to me why the BFO would converge to a fixed point of μ. If we’ve solved the problem of embedded agency and the AI system knows that yt can depend on its prediction zt, then it would tend to find a fixed point, but it could also do the sort of counterfactual reasoning you say it can’t do. If we haven’t solved embedded agency, then it seems like the function that best explains the data is to posit the existence of some other classifier h that works the same way that the AI did in past timesteps, and that yt=μ(h(xt))+v(h(xt)). Intuitively, this is saying that the past data is explained by a hypothetical other classifier that worked the same way as the AI used to, and now the AI thinks one level higher than that. This probably does converge to a fixed point eventually, but at any given timestep the best hypothesis would be something that is a finite number of applications of μ and v.
The BFO can generally cope with humans observing zt=f(yt)
It’s not clear to me why the BFO would converge to a fixed point of μ. If we’ve solved the problem of embedded agency and the AI system knows that yt can depend on its prediction zt, then it would tend to find a fixed point, but it could also do the sort of counterfactual reasoning you say it can’t do. If we haven’t solved embedded agency, then it seems like the function that best explains the data is to posit the existence of some other classifier h that works the same way that the AI did in past timesteps, and that yt=μ(h(xt))+v(h(xt)). Intuitively, this is saying that the past data is explained by a hypothetical other classifier that worked the same way as the AI used to, and now the AI thinks one level higher than that. This probably does converge to a fixed point eventually, but at any given timestep the best hypothesis would be something that is a finite number of applications of μ and v.
Should this be zt=f(xt)?
Thanks, corrected that.