In particular, a Bayesian superintelligence must optimize some utility function using a rich prior, requiring at least structural similarity to AIXI.
One model I definitely think you should look at analyzing is the approximately-Bayesian value-learning upgrade to AIXI, which has Bayesian uncertainty over the utility function as well as the world model, since that looks like it might actually converge from rough-alignment to alignment without requiring us to first exactly encode the entire of human values into a single utility function.
I’ll look into it, thanks! I linked a MIRI paper that attempts to learn the utility function, but I think it mostly kicks the problem down the road—including the true environment as an argument to the utility function seems like the first step in the right direction to me.
One model I definitely think you should look at analyzing is the approximately-Bayesian value-learning upgrade to AIXI, which has Bayesian uncertainty over the utility function as well as the world model, since that looks like it might actually converge from rough-alignment to alignment without requiring us to first exactly encode the entire of human values into a single utility function.
I’ll look into it, thanks! I linked a MIRI paper that attempts to learn the utility function, but I think it mostly kicks the problem down the road—including the true environment as an argument to the utility function seems like the first step in the right direction to me.